Open navigation
Search
Search

Select your region

Cultural Collections and AI: Towards a New Economy of Cultural Data

14 Oct 2025 France 4 min read

On this page

Since the emergence of artificial intelligence (AI) tools, threats to copyright and intellectual property have been widely discussed in the fields of music, publishing and the visual arts. By contrast, the valorisation of data and works held by public and private cultural operators has received far less attention, even though AI actors are actively seeking high-quality data to train their models.

Relationships between museums, libraries, archives, audiovisual operators and AI already exist, for example in the areas of conservation, handwritten text recognition, artwork identification and public engagement. Until now, AI has primarily represented a cost, offset by productivity gains or new services. However, the valorisation of collections as training data—a potential source of own revenues—remains marginal, despite the obvious value of high-quality literary, historical and artistic corpora. This prospect raises significant economic and legal issues.

In the United States, a widely commented-upon federal court order of 23 June 2025 (Bartz v. Anthropic) held that training an AI system on copyrighted works does not in itself constitute copyright infringement where the use is “transformative” under the Fair Use doctrine, a concept specific to U.S. law. However, the court also ruled that the mere use of works downloaded from illegal sources constitutes an infringement giving rise to damages. Several actions have already been brought against major actors accused of having used millions of books originating from so-called “shadow libraries”. Legal certainty regarding data use is therefore central, both for rights holders and for AI companies.

Rather than pursuing litigation, some companies have chosen to enter into substantial financial agreements with rights holders, thereby recognising the economic value of data. This approach opens the door to contractual discussions but also raises daunting practical questions, in particular as regards traceability, identification of rights holders and the distribution of compensation. The creation of settlement funds could offer a solution, and collective management organisations—whose expertise in the redistribution of copyright and neighbouring rights is well established—are likely to be key players. The holders of the works themselves could also assume an intermediary role.

In France, works in the public domain may be freely used, including by AI systems, subject to respect for moral rights (French Intellectual Property Code (IPC), Articles L. 121-1 et seq.). They may therefore be digitised, analysed, exploited and reused, including for commercial purposes, without authorisation or royalties. Law No. 2016-1321 of 7 October 2016, known as the “Digital Republic Act”, went further by requiring public bodies to make available free of charge the public information they produce, notably metadata, catalogues, descriptive records and reproductions of public-domain works. As a result, an eighteenth-century manuscript may readily be incorporated into AI training databases. A fee limited to the costs of digitisation and dissemination may nevertheless be charged where reuse is subject to a licence (Code on Relations between the Public and the Administration (CRPA), Article L. 324-2), at the initiative of the public operator, with the regulatory authority having defined the framework for the permissible tools (CRPA Articles L. 323-1 and L. 323-2; Article D. 323-2-1).

For protected works held under legal deposit or in private collections, authorisation from rights holders is essential if they are to be used for AI training. Directive (EU) 2019/790 (Article 3), transposed into French law (IPC Article L. 122-5-3), provides for a “Text and Data Mining” (TDM) exception for the benefit of research organisations and cultural heritage institutions, which is difficult for technology companies to rely upon. Moreover, rights holders may exercise an opt-out, making a licence mandatory where the use is traceable. Thus, a museum wishing to train an AI system on contemporary catalogues must verify whether the works fall within the TDM exception or negotiate a licence. This field opens up opportunities for valorisation for both public and private institutions holding works under various titles, which could negotiate the use of their data as training resources, collect licence fees and ensure redistribution to rights holders.

These issues remain largely uncharted. Courts and legislators will struggle to keep pace with innovation, but the coming months should provide decisive answers. This highlights the need for all stakeholders to monitor developments closely in this rapidly evolving field, at a time when the European Commission has just published its AI strategy (Apply AI Strategy), particularly with regard to the cultural and creative sectors.

Back to top Back to top