Key Points:
- The Dataset Providers Alliance (DPA) is established to promote ethical data sourcing in AI training.
- Formed in response to generative AI technologies that mimic human creativity, leading to copyright disputes.
- Members must not sell web-crawled text data or audio featuring voices without consent.
- Supports legislation like the NO FAKES Act and transparency requirements in AI training data.
A coalition of content-licensing companies, known as the Dataset Providers Alliance (DPA), has emerged to advocate for ethical data sourcing in training artificial intelligence (AI) systems. Announced on Wednesday, the DPA aims to protect the intellectual property rights of content owners and ensure rights for individuals depicted in datasets.
The DPA’s founding members include Rightsify, a U.S.-based music dataset company, image licensing service vAIsual, Japanese stock photo provider Pixta, and Datarade, a German-based data marketplace. The alliance was formed in response to the growing use of generative AI technologies replicating human creativity, which has led to backlash from content creators and numerous copyright lawsuits against major tech companies like Google, Meta, and OpenAI.
AI developers have been training models using vast amounts of content, often scraped from the internet without consent from the original creators or rights holders. While tech companies argue that this practice is legal, they are also quietly paying for access to private content collections to mitigate legal and regulatory risks. It has created a burgeoning industry of companies that package and sell content for AI system training, forming groups like the DPA to establish ethical standards for this trade.
The DPA emphasizes ethical practices in data transactions, requiring its members to refrain from selling text data obtained through web crawling or audio featuring people’s voices without explicit consent. A key focus for the alliance is advocating for legislation such as the NO FAKES Act, a U.S. bill introduced last year that seeks to penalize the generation of unauthorized digital replicas of people’s voices or likenesses. Alex Bestall, CEO of Rightsify and its licensing subsidiary GCX, who spearheaded the formation of the DPA, highlighted the importance of advocacy in resolving ongoing disputes over AI and copyright issues.
The DPA also supports increased transparency in training data, similar to the requirements outlined in the European Union’s AI Act and the U.S. Generative AI Copyright Disclosure Act, introduced in April. These legislative efforts aim to ensure that AI systems disclose the sources of their training data. Bestall mentioned that the DPA plans to publish a white paper in July, detailing its positions and advocating for ethical standards in AI data sourcing.