AI versus IP?

21-04-2023 | News

Generative AI can create serious intellectual property problems

by Gil Appel, Juliana Neelbauer and David A. Schweidel

Picture of Michael Dziedzic on Unsplash

Generative AI can feel like magic. Image generators such as Stable Diffusion, Midjourney or DALL-E 2 can produce striking images in styles ranging from aged photographs to watercolors, pencil drawings to pointillism. The resulting products can be fascinating: both the quality and the speed of creation are high compared to the average human performance. New York's Museum of Modern Art hosted an AI-generated installation from the museum's own collection, while the Mauritshuis in The Hague hung an AI variant of the Girl with a Pearl Earring by Vermeer while the original was on loan.

The abilities of text generators are perhaps even more amazing: they write essays, poems and summaries and prove adept imitators of style and form (even if they can take creative license with the facts).

While it might seem like these new AI tools could pull new material from the ether, that's not quite the case. Generative AI platforms are trained on data pools and question snippets – billions of parameters built by software that processes huge archives of images and text. AI platforms retrieve patterns and relationships that they use to create rules and then make judgments and predictions when answering a question.

This process carries legal risks, including intellectual property infringement. In many cases, it also raises legal issues that are still being resolved. For example, do copyright, patent and trademark violations apply to AI creations? Is it clear who owns the content that generative AI platforms create for you or your customers? Before they can reap the benefits of Generative AI, businesses need to understand the risks and how to protect themselves.

Where does generative AI fit in the current legal landscape

While Generative AI is new to the market, existing laws have significant implications for its use. Now the courts are trying to figure out how to apply the laws in force. There are infringement and right-of-use issues, uncertainties over the ownership of AI-generated works, and questions about unlicensed content in the training data, and whether users should be able to request these tools to directly reference copyrighted works and trademarks of other creators without their permission.

These claims are already the subject of controversy. In a lawsuit filed in late 2022, Andersen v. Stability AI et al., three artists started one class action to sue several generative AI platforms on the grounds that the AI uses their original unlicensed works to train the AI to their styles, allowing users to generate works that may not be sufficiently transformative compared to their existing works and protected and, consequently, would be unauthorized derivative works. If a court rules that the AI works are unauthorized and derivative, substantial infringement penalties may apply.

Similar cases presented in 2023 concern companies that have trained AI tools using data pools with thousands – or even many millions – of unauthorized works. Getty, an image licensing service, has filed a lawsuit against the creators of Stable Diffusion for misusing its photos by violating the copyright and trademark rights it holds to its collection of watermarked photographs.

In each of these cases, the legal system is called upon to clarify the limits of what is a “derivative work” under intellectual property laws, and depending on the jurisdiction, different courts may respond with different interpretations. The outcome of these cases is expected to depend on the interpretation of the doctrine of the fair use, which allows the use of copyrighted works without the owner's permission "for purposes such as criticism (including satire), commentary, news reporting, education (including multiple copies for 'use in the classroom), study or research” and for transformative use of the copyrighted material in a way for which it was not intended.

It's not the first time that technology and copyright law have collided. Google successfully defended itself against a lawsuit claiming that Transformative Use allowed it to pull text from books to create its search engine, and for now, that decision remains a precedent. But there are other non-technological cases that could affect the treatment of generative AI products. A US Supreme Court lawsuit against the Andy Warhol Foundation — brought by photographer Lynn Goldsmith, who had licensed an image of the late musician Prince — could refine US copyright law on the question of when a work of art is sufficiently different from the source material to become unequivocally “transformative” and whether a court may consider the meaning of the derivative work when assessing such a transformation. If the court finds that Warhol's work is not a fair use, that could spell trouble for AI-generated works.

All of this uncertainty presents its own set of challenges for companies using generative AI. There are risks of breach – direct or unintentional – in contracts that do not express themselves on the use of generative AI by suppliers and customers. If a business user is aware that training data may include unlicensed works or that AI may generate unauthorized derivative works not covered by the fair use, the company could be sued for willful infringement, which can result in damages of up to $150,000 for each instance of knowing use. There is also the risk of accidentally sharing trade secrets or sensitive company information by feeding data into generative AI tools.

Mitigate risk and build a road for the future

This new paradigm means that companies need to take new steps to protect themselves in both the short and long term. AI developers, for example, need to ensure that they are in compliance with the law regarding the acquisition of the data used to train their models. This should involve licensing and compensating those who own the intellectual property that developers seek to add to their training data, both through licensing and sharing the revenue generated by the AI tool. Customers of AI tools should ask vendors if their models have been trained with protected content, review terms of service and privacy policies, and avoid Generative AI tools that are unable to confirm that their training are properly licensed by content creators or subject to open-source licenses that AI companies respect.

Developers

In the long run, AI developers will need to take initiative on how data is sourced, and investors will need to know where the data comes from. Stable Diffusion, Midjourney and others built their models based on the LAION-5B dataset, which contains nearly six billion tagged images compiled through indiscriminate web scouting and famously includes a substantial number of copyrighted creations.

Stability.AI, which developed Stable Diffusion, has announced that artists will be able to opt out of the next generation of the image generator. However, in this way content creators have the burden of actively protecting their intellectual property, rather than requiring AI developers to secure the intellectual property of the work before using it; furthermore, even when artists opt out, the decision will only be reflected in the next iteration of the platform. Instead, companies should apply foropt-in of the creator rather than theopt out.

Developers should also work on ways to maintain provenance of AI-generated content, while increasing transparency about the works included in the training data. This includes logging of the platform used to develop the content, details of the settings used, metadata tracking of the source data, and tags to facilitate reporting of the AI, including the generative seed and specific prompt used to create the content. Not only would this information allow for the reproduction of the image, making it easy to verify its veracity, but it would also speak to the user's intent, thus protecting business users who may have to pass claims of intellectual property infringement, as well as demonstrating that the result is not due to an intentional copying or theft intent.

Developing these audit trails would ensure companies are prepared if (or, more likely, when) customers start requesting them in contracts as a form of assurance that the supplier's works are not intentionally, or unintentionally, derived without permission. In the future, insurance companies may require these reports to extend traditional insurance coverages to business users whose assets include AI-generated works. The breakdown of individual artist contributions that were included in the training data to produce an image would further support efforts to compensate contributors appropriately and even incorporate the original artist's copyright into the new creation.

Creators

Both individual content creators and brands that create content should take steps to examine the risks to their intellectual property portfolios and protect them. This involves proactively searching for their work in compiled datasets or large-scale data pools, including visual elements such as logos and artwork, and textual elements, such as image tags. Of course, this can't be done manually across terabytes or petabytes of content data, but existing search tools should allow for cost-effective automation of this task. The new tools may also promise obfuscation from these algorithms.

Content creators should actively monitor digital and social channels for the emergence of works that may be derived from their own. For brands with value to protect, it's not just about looking for specific elements like the Nike Swoosh or Tiffany Blue. Rather, brand and trade dress monitoring may need to evolve to examine the style of derivative works, which may have arisen from training on a specific set of brand images. While critical elements such as a specific logo or color may not be present in an AI-generated image, other stylistic elements may suggest that salient elements of a brand's content have been used to produce a derivative work. Such similarities may suggest an intent to appropriate the average consumer's trust in the brand by using recognizable visual or aural elements. Imitation can be considered the sincerest form of flattery, but it can also suggest intentional abuse of a trademark.

The good news for business owners dealing with trademark infringement is that trademark attorneys have figured out how to notify and enforce trademark rights against an infringer, such as by sending a cease-and-desist notice or letter application form, or moving directly to filing a trademark infringement complaint, whether it was an artificial intelligence platform that generated the unauthorized trademark or a human being.

Companies

Companies should evaluate the terms of their transactions to build protections into contracts. As a starting point, they should request terms of service from generative AI platforms that confirm proper licensing of the training data that powers their AI. They should also seek extensive indemnification for potential intellectual property infringements caused by the failure of AI companies to properly license data inputs or by the AI self-reporting its own findings to report potential infringements.

At a minimum, companies should add clauses in contracts with suppliers and customers (for the provision of customized products and services), in case one of the parties uses Generative AI, to ensure that intellectual property rights are understood and protected by both sides of the table, as well as how each side will support the registration of authorship and ownership of those works. Contracts with suppliers and customers may include an AI clause added to confidentiality clauses to prevent receiving parties from inserting the disclosing parties' confidential information into text messages from AI tools.

Some leading companies have created Generative AI contract change checklists for their customers, which evaluate each clause for the implications of AI in order to reduce the risks of unintended use. Organizations using Generative AI, or working with vendors, should keep their legal counsel updated on the extent and nature of such use, as the law will continue to evolve rapidly.

In the future, content creators who have a sufficient library of intellectual property to draw on may consider building their own datasets to train and mature AI platforms. The resulting Generative AI models do not need to be trained from scratch, but can be based on open-source Generative AI that has used legitimately sourced content. This would allow content creators to produce content in the same style as their work, with an audit trail of their own data lake, or to license the use of such tools to interested parties with clear ownership of both AI training data and its results. In this same spirit, content creators who have developed an online following may consider co-creating with followers as another means of procuring training data, recognizing that these co-creators should be asked for permission. to make use of their content in terms of service and privacy policies that are updated according to changes in the law.

Generative AI will change the nature of content creation, enabling many to do what hitherto only a few had the skills or advanced technology to operate at high speed. As this emerging technology develops, users must respect the rights of those who allowed it to be created – the very content creators who could be superseded by it. While we understand the real threat of generative AI to some of the livelihoods of members of the creative class, it also poses a risk to brands that have used images to create their identities. At the same time, both creatives and companies have the opportunity to create portfolios of branded works and materials, to insert meta-tags and to train generative AI platforms capable of producing licensed and proprietary goods (for a fee or with rights copyright) as sources of immediate income.

Gil Appel is an Assistant Professor of Marketing at the GW School of Business. 

Juliana Neelbauer is a partner of Fox Rothschild LLP and a lecturer at the University of Maryland.

David A. Schweidel is a Marketing Professor at Goizueta Business School, Emory University.

Share this content on: