Responsible AI Music

Ensuring the Future of Music Creation aligns with Trustworthy AI Principles

Generative AI is radically changing the creative arts, transforming the way we create and interact with cultural artefacts.

While offering unprecedented opportunities for artistic expression, this technology also raises ethical, societal, and legal concerns. Key among these are the potential displacement of human creativity, copyright infringement stemming from vast training datasets, and the lack of transparency, explainability, and fairness mechanisms. In response to this, a coalition of organisations representing creative industries formed the Human Artistry Campaign, advocating on behalf of the responsible use of creative AI. As generative systems become pervasive in this domain, responsible design is crucial.

Responsible AI Music (RAIM) is a collaborative initiative bringing together musicians, AI experts, ethicists, and legal experts to define, expand, and monitor requirements for generative music AI. Our goal is to work towards a framework providing guidance on the responsible development and use of generative models and system for music. By balancing innovation with ethical considerations, we advocate for a tradeoff where artists and AI development collaborate in a way that safeguards, inspires, and augments human creativity and artistry.

Can this be done by leveraging a Trustworthy AI framework? This initiative takes a holistic approach, harmonising previous work that has tackled specific aspects of generative systems (e.g., transparency, evaluation, data), within the Ethics Guidelines for Trustworthy AI produced by the European Commission - a framework for designing responsible AI systems across 7 macro requirements. Focusing on generative music AI, we illustrate how these requirements can be contextualised for the field, addressing trustworthiness across multiple dimensions and integrating insights from the existing literature.

What is Trustworthy AI?

Trustworthy AI encompasses artificial intelligence systems designed and implemented to adhere to fundamental ethical principles, technical robustness, and legal compliance. A referential work in this domain are the Ethics Guidelines for Trustworthy Artificial Intelligence, a document prepared by the High-Level Expert Group on Artificial Intelligence, an independent expert group appointed by the European Commission in 2018.

The guidelines include 7 key requirements that AI systems should meet to be trustworthy:

  1. Human agency and oversight
  2. Robustness and safety
  3. Privacy and data governance
  4. Transparency
  5. Diversity, non-discrimination, fairness
  6. Societal and environmental wellbeing
  7. Accountability
These requirements are of general applicability and relate to different stakeholders in the systems' life cycle (developers, deployers, end-users, broader society). Following a piloting process with 350 stakeholders, the guidelines lead to the creation of the Assessment list for Trustworthy AI (ALTAI).

Taken from https://ec.europa.eu/futurium/en/ai-alliance-consultation/guidelines/1.html

Guiding features for Responsible AI Music

We contextualise the Trustworthy AI framework to the domain of Generative AI Music, by defining responsible features that can drive the design and the evaluation of generative systems, in accordance with the literature.

Overview of features

Before presenting each feature, let's start by introducing some jargon. When referring to Music AIs, a distinction needs to be made between music model and generative music system.

Typically, a generative system is implemented in such a way as to conveniently wrap the functionalities of a particular model, meaning that a model can provide the computational backbone to various generative systems (e.g., plugins for music editors, production environments, smart instruments). For example, MusicVAE has been reused in different applications, such as Beat Blender, Melody Mixer, Latent Loops, and is also available through Magenta Studio, a plugin for the DAW Ableton Live. The distinction between model and system is a peculiar aspect to Generative AI, as their design and implementation involve different stakeholders, such as machine learning engineers and mathematicians for the former, and UX designers, software developers, data engineers for the latter, but also share music experts as a common denominator driving the evaluation efforts.

Discover all features below or jump to those belonging to a specific Trustworthy AI pillar.

Pillar 1: Human Agency and Oversight

Central to the design of trustworthy systems is the capacity of individuals to understand and meaningfully influence the actions of AI systems (human agency) while monitoring and interacting their behaviour to operate responsibly and align with human values and preferences (human oversight).

Personalisation

The system can reuse the musical repertoire of the user to personalise the style of the generation.

Creative feedback (HITL)

The system can iteratively refine or improve the generation based on the feedback provided by the user throughout the creative process.

Controllability (conditioning)

To steer the generations depending on the user's preferences, the system provides a rich variety of modalities including language, melodic, harmonic, rhythmic, and emotional control of the generation.

Red button

If the system is generating and playing music that is either unpleasant, disturbing or contains lyrics with offensive language, the user can always halt the generation process at any time.

Safety disclaimer

The system always advises the user whenever it can generate music that may be deemed dangerous or offensive to some categories of users (e.g. offensive language in lyrics).

Deception avoidance

The system does not show any deceptive behaviour related to the copyright and ownership of the generated material. For instance, the system never tries to claim ownership or novelty of the generations when music is possibly plagiarised.

Artist involvement

The system has been designed with the active involvement of creative professionals throughout its development cycle.

Pillar 2: Robustness & Safety

This requirement focuses on accuracy, reliability and reproducibility, resilience to attack, security, and fallback plans.

Music leakage prevention

The system does not allow for the full or partial reconstruction of the music material used as training data unless this is explicitly acknowledged and allowed by the copyright holders of the music data.

Generation fallback

If the system needs to generate music continuously over time, and the content is considered offensive at some point, the system can switch to a simpler generative strategy (e.g. a rule-based mode).

Model evaluation

The evaluation of the model/system is consistent with other frameworks and benchmarks for music generation.

Music evaluation

The system/model provides a comprehensive evaluation of the musical properties of its generations.

Expert evaluation

The evaluation of the system/model involves creative professionals or music experts.

Model availability

The computational model behind the system is fully publicly available and includes pre-trained checkpoints.

Training data availability

The training material on which the system relies is fully publicly accessible.

Prompt-to-Gen

If sample generations are released, the model can be seeded and prompted to recreate the same musical content.

Pillar 3: Privacy and Data Governance

Central to the goal of preventing unintended harm and achieving trust with users, data governance must encompass the quality and integrity of the data utilises, its pertinence within the intended domain of AI deployment, strict access protocols, and mechanisms to safeguard privacy.

Prompt leakage prevention

The system neither distributes nor leaks any personal data used by users to prompt the generation of music.

Training metadata integrity

The model advises if it is not trained on music data which is fully and correctly attributed to the right authors.

Safety of training data

The model advises if it is trained on music data that can be deemed offensive or socially harmful (e.g. lyrics).

Prompt governance

If the system stores data collected from users through the generation process (e.g., prompts, feedback, music), access to it is fully regulated.

Generative reuse of music data

The model uses music material (e.g. scores, audio recordings, lyrics, MIDI recordings and transcriptions) whose licensing and terms of use explicitly allow for training systems.

Copyright and licensing of generations

The system provides guidance or clear and comprehensive information about the copyright and licensing that apply to the generations.

Pillar 4: Transparency

This requirement is closely linked with the principle of explainability and encompasses transparency of elements relevant to generative systems, e.g., the data, the system, and the business model. Hereafter, we outline the main elements influencing the transparency of these systems.

System documentation

The system's design is well-documented throughout its development cycle, including instructions for model implementation, training, and data generation.

Evaluation documentation

The evaluation of the model/system is well-documented and reproducible, promoting consistency and transparency in future evaluations (e.g., of other music models).

Artefact watermarking

The system automatically embeds a watermark into every generation to remark on their artificial nature.

Generation explainability

The system can explain how the generations are created in a way that is understandable to its target users.

Data explainability

The system can relate each generation to the training material that contributed to its creation process (e.g. a pattern, motive, or sample).

Artificial awareness

Users interacting with the system during the generation process are always aware of its artificial nature.

Benefits communication

The benefits of using the particular system, compared to other solutions, are communicated to users prior to its use.

Limitations communication

The technical limitations and the potential risks of the system are communicated to users prior to its use.

Instructional material

Appropriate instructional material and disclaimers are provided to users on how to adequately use the system.

Pillar 5: Diversity, Fairness, and Non-Discrimination

This requirement emphasises the need to promote diversity, non-discrimination, and fairness in generative systems by establishing mechanisms to avoid unfair bias, designing for accessibility, and ensuring fair treatment for all users.

Music corpus statistics

The system provides a quantification of the kind of music used for training (genre, style, period, etc.), with statistics on the training corpora.

Accessible interfaces

The system provides generative interfaces and/or prompting modalities to make it more accessible and inclusive to users.

Accessibility assessment

If the system provides accessible features, these have been evaluated and tested with the specific target of users they are intended for.

Accessibility awareness

The system explicitly acknowledges whether its use is limited, or not suitable, to certain categories of users.

Continuous assessment

The system includes music stakeholders (creative professionals, ethical experts, AI engineers and researchers) as part of a long-term strategy for the continuous assessment of its outputs, impact, and trustworthiness.

Pillar 6: Societal and Environmental Wellbeing

This requirement focuses on accuracy, reliability and reproducibility, resilience to attack, security, and fallback plans.

Training footprint

The system provides an indication of the resources consumed for training the model, in terms of hardware, time, and energy consumption (cost of training).

Generation footprint

The system provides any indication of the environmental footprint created after generating a whole song or a part of it (cost of inference).

Responsible data collection

If the model uses any human-made annotation (for training, or evaluation), data collection or crowdsourcing was conducted and documented ethically, fairly, and with adequate compensation for annotators.

Social purpose

The system has been designed and used to bring societal benefits, e.g. supporting teaching activities, wellbeing applications, or improving accessibility of creative technologies.

IP validation

The system has mechanisms in place to detect possible cases of plagiarism or IP infringement resulting from the generations.

Revenue sharing

If the system is a paid service, revenues from the generations are also shared with the artists who contributed training data.

Pillar 7: Accountability

This requirement corroborates all the others by ensuring that AI systems are accountable for their responsible design, implementation, and impact throughout their lifecycle.

Audit access

If proprietary, access to the model behind the system and training material can be granted to internal and/or external auditors (on request) and their evaluation reports can be made available.

Impact assessment

Prior to the deployment of the system, potential negative impacts have been identified, assessed, minimised and openly communicated.

Responsible statement

The inability of a system to provide responsible features (like those outlined before), in full or in part, is explicitly motivated and documented. Trade-offs should be carefully selected to prioritise the minimisation of risks to ethical principles.

Generative redress

When unjust adverse impact occurs (e.g. generations contain offensive content, or the music is plagiarised), the system explicitly accounts for redress (e.g. compensation, direct correction through feedback).

Call for action

How to get involved

We actively seek insights and views from AI researchers, ethicists, legal experts, and music professionals to ensure continuous refinement and responsible expansion of generative AI technologies in the music industry. We warmly welcome policymakers, music industry stakeholders, and civil society organizations to join us in this crucial dialogue. Your voice is invaluable in shaping the ongoing discourse surrounding the broader implications of generative AI for all creative industries. There are different avenues to contribute to the initiative. Here are some suggestions:

Contribute to the RAIM framework & join our study

We are building a network of experts who can provide advice and support on responsible AI music. This could involve participating in online forums, or attending our events. Share your insights and experiences, and help inform the next generation of AI Music systems. We are soon launching a study to assess the importance of each feature relative to each stakeholder group, leading to the definition of the framework. To register your interest in participating, please click the button below to send us an email. We will contact you with further details once the study commences and all protocols are in place.

Register Your Interest
Reach out and collaborate with us

If you are interested in working with us, please get in touch! We are actively seeking collaborations to drive the implementation of the RAIM framework. Join us in promoting the next generation of AI music systems that prioritise ethical considerations and adhere to responsible features. Let's work together to establish benchmarks, develop databases and tools, and explore innovative solutions for RAIM.

Reach Out
Share the initiative

Help us spread the word about the RAIM Initiative. Become an advocate and raise awareness about the potential of generative AI in music and the importance of responsible innovation. Share our website and social media channels with your network, or writing about us in your publications.


Together, we aim to foster a vibrant and inclusive ecosystem where generative AI truly empowers musicians and enriches the musical landscape for everyone.