Failure to compensate human creators could stifle AI advancement

Image by: Curtis Heinzl

The age-old computer science adage “garbage in, garbage out” hasn’t lost relevance.

The saying asserts a system built using low-quality data will yield subpar results. It’s a line ChatGPT developer OpenAI ought to keep in mind as it faces mounting legal backlash from creators whose work the artificial intelligence (AI) trailblazer scraped to train its flagship generative models.

There’s little doubt generative AI, which directly competes with the authors of its training data, is becoming a faster and cheaper alternative to many types of human labour. It may seem like a tech giant’s dream to swallow creative industries whole, but if AI starves artists to the point where they drop their paint brushes, it will spell bad news for technological advancement.

If creative industries fade, there will be less human-made material available to train AI, and the quality of the models will stagnate. With fewer artists publishing their work on the internet, synthetically generated material will gladly fill in the gaps.

Much of that synthetic material will bleed into new training sets used to build future models. This is a problem because generative AI mimics the data it’s trained on, but does so imperfectly. No matter how advanced the model, a slight amount of nuance is lost each time it ingests a photo, novel, or painting.

Some Google reviews already have that je-ne-sais-quoi “AI voice,” and some Instagram photos look too plasticky to have been taken with a real camera. It’s easy to point out AI-generated content when the person who commissioned it hasn’t gone out of their way to conceal its artificiality.

Imagine we trained a model on synthetic data, then we used the output of that model to train the next model, and so on. We’d enter a cycle of deterioration that eventually causes future models to spit out garbage.

OpenAI acknowledged this week it’s impossible to develop useful generative AI models without using copyrighted material. This is no surprise, especially since the best writers prefer to write for a paycheque.

Tech companies must understand keeping human artists in the game is essential to AI’s advancement. Compensating creators fairly for their contributions to generative models means they’ll have a reason to produce more high-quality training data.

As the fruits of generative AI flood the internet, tech firms will face the spectre of synthetic content polluting training sets and stifling innovation—a non-issue prior to the explosive virality of -friendly models like ChatGPT and DALL-E. While AI research will most definitely advance, the quality of the data must keep up.

It’ll become increasingly vital for tech companies to license reliable pipelines of human-made content to continue training successive generative AI models.

OpenAI’s homepage states the firm’s mission to create safe artificial general intelligence that benefits all of humanity. Therefore, it should prioritize AI innovation—and it’s clear human artists are an indispensable piece of the puzzle.

Curtis is a fourth-year computing student and The Journal’s Production Manager.

Tags

OpenAI

All final editorial decisions are made by the Editor(s) in Chief and/or the Managing Editor. Authors should not be ed, targeted, or harassed under any circumstances. If you have any grievances with this article, please direct your comments to [email protected].

Leave a Reply

Your email address will not be published. Required fields are marked *