Key contacts
Introduction
On 12 May 2025, the European Union Intellectual Property Office (“EUIPO”) published a comprehensive report on the implications of generative AI (“GenAI”) for copyright law in the European Union (the “Report”). The Report delves into how GenAI systems interact with existing legal frameworks, with a particular focus on the use of copyright-protected works in AI training (“GenAI Input”) and the legal status of AI-generated content (“GenAI Output”).
While the Report is primarily aimed at legal and policy experts, its relevance extends beyond that audience. Technology companies, content creators, platforms, and copyrights holders across sectors will all be directly impacted by the way in which the EU navigates the key balance between innovation and intellectual property rights. The legal and commercial questions raised by GenAI around data use, authorship, licensing, and enforcement are fast becoming operational and strategic concerns for businesses.
To support this evolving landscape, the EUIPO will launch a ‘Copyright Knowledge Centre’ by the end of 2025, aimed at helping copyright holders better understand and manage the use of their works in GenAI development.
This article unpacks the Report’s key takeaways, explains why they matter today, and explores what they reveal about the evolving relationship between copyright law and AI in Europe.
1. GenAI Input
The Report begins by explaining that AI models require massive amounts of data to train their algorithms. To meet this demand, developers rely heavily on automated data collection techniques such as web crawling and web scraping.
1.1 Legal Summary
These collection methods often involve accessing significant amounts of copyright-protected content. Under the EU Digital Single Market Directive (“DSMD”), text and data mining (“TDM”) is permitted in certain circumstances – but subject to a key exception under Article 4 DSMD, which allows copyright holders to reserve their exclusive rights in their copyright works – commonly referred to as “opting out”. For content that is publicly available online, the reservation under Article 4 must be made expressly and in an appropriate manner, including by machine-readable means. Where a rights holder has opted out, AI developers must obtain authorisation – typically through a licensing agreement – before making copies of the content for training purposes.
The EU Artificial Intelligence Act (“EU AI Act”)[1] complements the DSMD by imposing additional obligations on providers of GenAI models. These providers must respect copyright holders’ TDM opt-outs, and are required to publish sufficiently detailed summaries of the data they use in the training of their models. The EU AI Act also places obligations on deployers of GenAI systems to ensure that AI-generated content is detectable in a machine-readable format.
If you want to learn more about the intersection between the TDM exceptions and AI, you can read our comprehensive overview here.
1.2 Robots Exclusion Protocol
In practice, one way that website operators express their wish to limit automated access is via the ‘Robots Exclusion Protocol’ (“REP”). This protocol is a technical standard that signals to web crawlers where content may be accessed or indexed. However, while REP can function as a tool for controlling data access, stakeholders widely agree that it is an imperfect and temporary solution. Its limitations include a lack of specificity, dependence on web administrators, unenforceability, and the voluntary nature of scraper compliance.
From a legal perspective, REP does not in itself satisfy the requirements of Article 4 of the DSMD which demands that copyright holders express their opt-out “in an appropriate manner”, including by machine-readable means. While REP is machine-readable, it is not universally recognised as a valid form of rights reservation under the DSMD – particularly when used alone without more explicit rights management declarations.
1.3 Rights Reservation in Practice
The Report further notes that no single, standardised opt-out mechanism currently exists. Instead, copyright holders employ a combination of legal and technical measures to reserve their rights such as unilateral declarations, licensing constraints, and website terms and conditions. Legal mechanisms may apply to individual works or entire repertoires. Technical approaches, meanwhile, are either (i) location-based (i.e. connected to specific URLs); or (ii) asset-based (i.e. linked directly to the content itself). Each method has advantages and disadvantages, and most stakeholders report using a hybrid strategy. However, as the Report emphasises, these tools only allow expression of rights – not enforcement. The responsibility lies with AI developers to properly configure their scraping tools and content filters to honour opt-outs and avoid infringement. The Report anticipates that standard practices will eventually emerge, tailored to sector-specific needs.
2. GenAI Output
The Report also explores the legal and technical questions surrounding content generated by AI systems. It notes that the nature of AI-generated content varies depending on the architecture of the model and the type of material being produced. The Report reviews current and emerging solutions designed to identify and manage synthetic content, such as provenance tracking, generative content detection, tagging, and methods like watermarking and digital fingerprinting to help comply with transparency obligations under the EU AI Act.
2.1 Retrieval-Augmented Generation
The Report focuses on the growing use of Retrieval-Augmented Generation (“RAG”), a method that combines GenAI with real-time information retrieval. RAG reduces the need for retraining by allowing models to fetch data dynamically – often from external sources that are copyright-protected. This raises distinct legal questions around licensing, as RAG’s use of vectorised embeddings stored in databases differs from traditional training methods covered by TDM exceptions. The Report notes that while RAG improves efficiency and relevance, it complicates compliance due to uncertainties over whether such retrieval falls under copyright or database rights regimes, particularly when dynamic content is accessed via web scraping.
2.2 Technical and Legal Safeguards
To reduce the legal risks of output that might replicate protected content, GenAI developers are already investing in technical safeguards. These include filters for blocking near-duplicate outputs, tools for comparing generations against training data, and prompt engineering techniques such as prompt rewriting and negative prompting. Prompt rewriting modifies user inputs to avoid triggering imitative outputs, while negative prompting explicitly instructs the model to exclude certain protected features – such as famous characters or artistic styles. To further mitigate the risk of memorisation, where models (especially LLMs and generative vision models) might reproduce training data verbatim, some developers are adopting differentially private training, a technique specifically designed to prevent models from retaining and repeating individual data points. Collectively, these approaches aim to reduce the risk of plagiarism and copyright infringement in AI-generated content.
The Report also highlights model editing and model unlearning as advanced methods to modify or erase specific content within a model post-training and deployment. While promising, these approaches face significant technical limitations, especially for large-scale foundation models where isolating individual data points in the model’s parameters is extremely difficult. Tools like SERAC are emerging as alternatives, enabling memory-based edits without relying on costly gradient-based operations, thus allowing developers to block or correct problematic generations more efficiently. Additionally, some GenAI providers now offer legal indemnities to users, reflecting increased industry awareness of potential liability.
2.3 The Role of Public Institutions
The Report suggests that public institutions could help align practices and expectations by promoting shared technical standards, supporting ethical prompt use, and enhancing interoperability between platforms. As AI systems grow in capability and scope, a combined approach – blending legal frameworks, technical innovation, and public oversight – will be essential to managing the downstream risks of GenAI output.
3. Licensing and Economic Implications
Lastly, the Report briefly touches on the economic and licensing implications of GenAI. It notes that a nascent market for direct licensing of content for AI training is emerging. This development is driven by both the increasing demand for high-quality, well-structured datasets and concerns over the long-term availability of suitable training material. Sectors such as journalism and scientific publishing are cited as early participants in this market.
The Report identifies several factors that will influence the evolution of licensing practices. These include the establishment of benchmark market rates, the legal basis for remuneration, and the method of calculating compensation – whether tied to the inherent value of the content, its usefulness for training, or the revenue of downstream GenAI services. The Report stresses that robust opt-out systems are a prerequisite for the development of a functioning licensing marketplace, which could, in turn, offer new revenue streams for rightsholders.
Conclusion
In conclusion, the Report makes clear that managing the interface between copyright and GenAI will require coordinated, transparent, and forward-looking approaches. It calls for new technical standards, policy instruments, and collaborative frameworks to ensure that copyright law remains fit for purpose. Institutions such as national IP offices and the EUIPO have key roles to play – whether by hosting databases, offering guidance, or promoting awareness. The establishment of the Copyright Knowledge Centre is expected to advance these efforts, although lasting solutions will likely depend on the continued engagement from copyright holders, policymakers, and GenAI developers alike, as they work to balance continued innovation with the protection of creative rights in the age of GenAI.
This article was co-authored by Ana-Maria Curavale (trainee solicitor).
[1] Regulation (EU) 2024/1689