There was a big complaint filed on Friday in the District of Delaware—Getty Images, a the very-well-known provider of stock images, filed suit against Stability AI over its use of Getty Images stock photos to train its image generation algorithm, which it calls Stable Diffusion.
Stable Diffusion is one of the incredible AI-based image generators making news recently (along with others like Dall-E 2 and Midjourney). These AI models can accept a text prompt and generate a corresponding image. For example, prompted with "an elephant in roller skates," Stable Diffusion generated the following:
So Why Is Getty Coming After Them?
Broadly speaking—and as alleged in Getty Images' complaint—Stability AI created Stable Diffusion by training a machine learning model to generate output by looking at existing images and corresponding text describing the images.
According to the the complaint, Stability AI trained Stable Diffusion on "5 billion image-text pairs" from another company, LAION. That company got the image-text pairs by scraping (automatically downloading) "billions of pieces of content" from websites, including watermarked images from Getty Images' own site.
Something tells me they may be on to something. Here is a Stable-Diffusion-generated image that I got back with the prompt "soccer":
If you can look past the strange faces it generated, you can see an interesting rectangle on the right-hand side of the frame.
About the Claims
Copyright Infringement - Getty asserts a copyright claim. Notably, this is not based on recently-discovered effect where the Stable Diffusion algorithm may spit out carbon copies of the training images. Instead, it is based on the copying and editing of Getty's images in the process of creating Stable Diffusion, rather than in the output of the system.
In their copyright allegations, they made no mention of fair use at all. I imagine that will be a significant part of any response on the copyright claim.
Back in November, attorneys in California filed a class action lawsuit against Microsoft and GitHub over Github Copilot, which is a machine learning model that generates source code, and that is trained on open-source source code. In that case, the attorneys omitted a copyright claim entirely—perhaps to avoid a fair use argument altogether. Here, it seems that Getty thought they could support it.
Removing Watermarks and Adding Watermarks - Getty also asserts claims based both on the removal of their watermarks and on the addition of Getty Images watermarks to non-Getty Images content:
Upon information and belief, Stability AI has knowingly removed Getty Images’ watermarks from some images in the course of its copying as part of its infringing scheme. At the same time, however, as discussed above, the Stable Diffusion model frequently generates output bearing a modified version of the Getty Images watermark, even when that output is not bona fide Getty Images’ content and is well below Getty Images’ quality standards.
Lanham Act Claims - Getty also sets forth a number of Lanham Act claims, including trademark infringement, unfair competition, and trademark dilution.
State Law Claims - Getty also throws in claims for Deceptive Trade Practices and Trademark Dilution under Delaware law.
This Could Be a Huge Case for Machine Learning
To my knowledge, most of these amazing new AI/machine-learning-based content generators work in roughly similar ways: by training a machine-learning model on a huge amount of content. That's how Microsoft made Github Copilot, an AI that generates code; that's how OpenAI made the incredible ChatGPT chatbot; and that's how Stability AI made Stable Diffusion.
But if training a machine learning model on media results in copyright infringement—and if that training is not fair use—will development of these content-generating AIs have to stop? This is definitely a case to watch.
If you enjoyed this post, consider subscribing to receive free e-mail updates about new posts.