subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now
Picture: ISTOCK
Picture: ISTOCK

If AI is trained on copyright material and agencies use it to generate creative work, are agencies guilty of copyright violation?

There is a recent flavour to the way agencies and clients are talking about AI that is elevating legal copyright issues to the front of the discussion. In short, the argument is that because many AI platforms are trained on copyright material (at least potentially) the work product of these systems cannot be trusted to be legally sound. In other words, there may be a hidden copyright infringement in the text and images generated by AI systems. Thus it is not safe to use text or images generated via AI platforms for commercial work.

For their part, AI platforms like Google, Adobe and Microsoft have so far committed to taking the legal consequences of any action brought against users of their products. This may change, and only applies to cases where copyright has been infringed accidentally, but it is a bold commitment on their part. Why are they feeling so confident?

How AI works

To understand why these big tech players are so brazen in their approach to copyright it’s worth explaining how AI models work.

Both large language models and image generation tools are statistical systems. By feeding them enormous amounts of example data (billions of pages of text; millions of images) we can train the machine-learning algorithms to learn the patterns that go into making these assets. Once an image generator has consumed millions of pictures of a car, it can, in a sense, learn what ingredients make up the concept of a car and apply those learnings to generate credible new cars.

The AI doesn’t need to store or directly use the source data to work. Instead, it uses its training to generate new assets which are similar to, but yet entirely distinct from, any individual asset it was trained on.

The role of copyright content

Because many of these systems have been trained on data available on the public internet as well as on other large databases of content (for example large book databases) it is likely that at least some of the content used in training is copyrighted. What that really means is that if that content is used for commercial purposes, there is an author or creator who has to be compensated for their intellectual property.

The question therefore is: are the derivative works,  namely the assets generated by the AI, subject to copyright restrictions?

Fair use

Copyright law in most parts of the world has a concept of “fair use” that applies to copyrighted content. A simple example is if you were to quote a short passage from a book in a book that you write. As long as the original is cited as a source it is “fair” for an author to quote this passage. A similar rule applies to samples used by musicians in their own music and in many other cases.

So, is AI training fair use?

This is not yet settled in law, but in my view the likely outcome of any court case is going to go in favour of the tech platform. Why?

The thing the tech platform is selling for commercial use is not the underlying copyrighted work. The analogy here is if your child learns about science by watching National Geographic on TV. They then continue learning and eventually grow up to be a scientist, write an influential book and become rich through its becoming a bestseller. Do the copyright holders of National Geographic have a claim over the royalties from your now adult child’s book?

I’m sure we would all easily agree that they don’t. Why? Because though that original TV show might have had an influence, even a seminal influence, on that child’s knowledge in the field, it would be impossible to determine how much influence, and what we would see would be the hard work done by the new author in comprehending and processing this learning and in adding to it their own unique perspective and articulation. They may express gratitude to the National Geographic folks but that’s all anyone would reasonably expect.

Now, you could say: “Well, that child watched National Geographic on a paid TV channel so that there was some payment being made for the content somewhere in the story.” Even if this is true, all we are then saying about AI is that the tech owner should have based their training on a legal copy of the content. So then, the standard is that they should have bought a copy of the books they trained on or had a valid subscription to news sites they may have consumed.

This seems fair, but it is not what most legal people are telling us to worry about. They seem to be implying that if a copyright work was used during training, any derivative work that in any small way used this material to generate an output should somehow pay royalties to the copyright holder.

This seems absurd and based on a misunderstanding of whether the AI is creating a pastiche, a mash-up of original works, or authentically making something new. It is very obviously doing the latter. If you ask Dall-E to generate a green car it is using what it has learnt about cars and applying statistics to these learnings to create a brand new car. This new car is not made up one millionth from a car from Shutterstock and one millionth of a car from Getty Images any more than the author in my narrative above could be said to be using one millionth of the National Geographic show. They are, in a sense — but we generally understand that learning from something is not the same as plagiarising it.

Again, this is my opinion. There are multiple live lawsuits happening which will create some legal precedent. But just thinking about it this way makes the case brought by individual authors or image rights holders seem hopeless.

Does this mean it’s a free for all?

Obviously, operating commercially exposes companies to risk. And when there is no settled law, this risk can seem extreme. It is thus natural for companies and agencies to warn their teams that using AI tools could expose them to future copyright action.

But let’s analyse logically what could happen here:

A. Nothing — the authors and creators lose their cases, and the tech platforms prevail.

B. Individual copyright holders win — the courts hold that using, say, Game of Thrones to train a large language model constitutes copyright infringement. They are instructed to remove Game of Thrones from the training set and retrain their model. This will maybe affect the platform in a negligible way for a few seconds — removing one work will make no difference at this stage of the game. Another outcome is that OpenAI or DeepMind may have to pay writer George RR Martin some kind of settlement without changing anything.

How much of this legal noise is a real worry about the poor artist or writer and how much is just a cynical attempt to slow down the march of AI?

C. Class action suits win — this is somewhat more serious, in that large amounts of the training material could be declared illegal. This might result in a large settlement, or a lot of training material might have to be removed before retraining. It’s conceivable that all material generated before a particular date might then be declared in breach of copyright and every user could end up liable for damages or have to delete all their assets. In this scenario it seems much more likely that the tech platform would simply pay copyright holders some kind of settlement and continue to protect their users, because the long game would almost certainly be far more valuable than any short-term settlement amount, no matter how large.

The likelihood of option C happening seems, to me, negligible. Consider the very recent decision (October 30) by Judge William Orrick in California, who is presiding over the case against Stable Diffusion, DeviantArt and Midjourney:

“Orrick agreed with all three companies that the images the systems actually created likely did not infringe the artists’ copyrights. He allowed the claims to be amended but said he was ‘not convinced’ that allegations based on the systems’ output could survive without showing that the images were substantially similar to the artists’ work.” — Reuters.

Is going legal just fear?

Lastly, how much of this legal noise is a real worry about the poor artist or writer and how much is just a cynical attempt to slow down the march of AI towards its now obvious destination?

When the internet first gained ground, I heard a lot of these kinds of legal arguments against the proliferation of a new technology. Lawyers opined ad nauseam about how the public internet infringed copyright, created impossible commercial legal scenarios and would essentially never be able to be legally viable. They were paid by owners of “old media” who thought this kind of strategy could fend off the imminent threat that digital represented.

We all saw how that story ended — and this one will end the same way. It is indeed sad and scary that human creativity can now be synthesised. And it may feel unfair that someone’s creative output was used by a machine learning system to outperform the original. But a feeling of unfairness is not the same as theft. And these feelings seem more about fear of what this means for all of us and our ability to earn a living than anything to do with right and wrong.

I am not recommending that anyone ignore the potential copyright problems that may emerge in the AI field. Obviously, doing business is a game of risk management, and if you want to avoid all risk, by all means hold back on using AI-generative content for the time being. But ask yourself if being less adept and informed and experienced with AI is actually worth avoiding that risk. Because in my humble opinion this genie is not going back in the bottle and sooner or later AI will be the way content is generated.

We will all have to come to terms with our feelings of alienation and potential irrelevance regardless.

Jarred Cinman is Joint CEO of VMLY&R. He also founded South Africa’s first digital creative agency back when the internet was new and scary.

The big take-out: If you want to avoid all risk, by all means hold back on using AI-generative content for the time being. But ... this genie is not going back in the bottle and sooner or later AI will be the way content is generated.

subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now

Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.

Speech Bubbles

Please read our Comment Policy before commenting.