Developed by a Team at Imperial College London, a New Tool Can Detect Whether the Work of Copyright Holders Has Been Used in AI Models
Since the rise of generative AI, content creators have expressed concerns that their work has been used in AI models without their consent. Until now, verifying whether specific text was included in training datasets has been challenging.
A team at Imperial College London has developed “copyright traps” to address this issue. These traps are pieces of hidden text that allow writers and publishers to subtly mark their work, enabling them to detect if it has been used in AI models. The concept is similar to historical tactics used by copyright holders, such as inserting fake locations on maps or fictitious words in dictionaries.
These AI copyright traps come at a crucial time. Numerous publishers and writers are currently engaged in legal battles against tech companies, alleging unauthorized use of their intellectual property in AI training datasets. One prominent case is The New York Times’ lawsuit against OpenAI.
The code for generating and detecting these traps is available on GitHub, and the team plans to develop a tool to help people create and insert copyright traps independently.
“There is a complete lack of transparency in terms of which content is used to train models, and we think this is preventing finding the right balance [between AI companies and content creators],” says Yves-Alexandre de Montjoye, associate professor of applied mathematics and computer science at Imperial College London, who led the research. The findings were presented at the International Conference on Machine Learning in Vienna this week.
To create the traps, the team used a word generator to produce thousands of synthetic sentences, which are long and nonsensical. For instance: “When in comes times of turmoil … whats on sale and more important when, is best, this list tells your who is opening on Thrs. at night with their regular sale times and other opening time from your neighbors. You still.”
The team generated 100 trap sentences and randomly selected one to insert into a text multiple times. The trap could be embedded in various ways, such as white text on a white background or within the article’s source code. This sentence had to be repeated 100 to 1,000 times.
To detect these traps, the researchers fed a large language model the 100 synthetic sentences to see if it recognized them. If the model had encountered a trap sentence during training, it would show a lower “surprise” (or “perplexity”) score. A higher surprise score would indicate that the model was seeing the sentence for the first time, thus not identifying it as a trap.
Previously, researchers suggested using a technique called a “membership inference attack” to determine whether a language model had seen specific data during training. This method works well with large, state-of-the-art models, which tend to memorize a lot of their training data.
However, smaller models, which are increasingly popular and can run on mobile devices, memorize less and are less vulnerable to membership inference attacks, making it harder to verify if they were trained on specific copyrighted documents. “Copyright traps are a way to do membership inference attacks even on smaller models,” says Gautam Kamath, assistant computer science professor at the University of Waterloo, who was not involved in the research.
The team tested their traps on CroissantLLM, a new bilingual French-English language model with 1.3 billion parameters. The research demonstrated that it is possible to introduce traps into text data, enhancing the effectiveness of membership inference attacks even on smaller models. However, Kamath notes that much work remains to be done.
Inserting a 75-word phrase 1,000 times into a document significantly alters the original text, making it easy for those training AI models to detect and remove the trap. This also impacts the readability of the text. “A lot of companies do deduplication, [meaning] they clean up the data, and a bunch of this kind of stuff will probably get thrown out,” says Sameer Singh, professor of computer science at the University of California, Irvine, and co-founder of Spiffy AI, who was not part of the research.
Improving copyright traps could involve finding new ways to mark copyrighted content or enhancing membership inference attacks themselves, suggests Kamath. De Montjoye acknowledges that the traps are not foolproof and could be removed by a knowledgeable attacker. “Whether they can remove all of them or not is an open question, and that’s likely to be a bit of a cat-and-mouse game,” he says. The more traps that are applied, the harder it becomes to remove all of them without significant engineering resources.
“It’s important to keep in mind that copyright traps may only be a stopgap solution, or merely an inconvenience to model trainers,” says Kamath. “One cannot release a piece of content containing a trap and have any assurance that it will be an effective trap forever.”