Artificial intelligence

Could we see the collapse of generative AI?

Date:

Changed on 05/03/2025

What happens when a generative artificial intelligence (AI) model is trained using an increasing proportion of self-generated images? Quentin Bertrand, researcher at the Inria Lyon centre and member of the Malice project team, sought to answer this question.
image illustrant une IA s'entraînant sur des images générées par d'autres IA

Image generated using the Dall.E. generative model. Prompt used: “image illustrating an AI training on images generated by other AIs”.

 

The International Conference on Learning Representations (ICLR) is an event that brings together professionals and researchers from across the world to discuss breakthroughs in Deep Learning. At the 2024 edition of this conference Quentin Bertrand, Joey Bose, Alexandre Duplessis, Marco Jiralerspong and Gauthier Gidel presented a paper entitled “On the stability of iterative retraining of generative models on their own data.

In it the authors explore what might cause a model to ‘collapse’, a phenomenon whereby an AI’s output becomes increasingly biased, homogeneous, poor and inaccurate as a result of it generating its own supply of training data. The results of this study could shed new light on an issue that is central to generative models. Let’s take a closer look.

Training generative models

Models which generate synthetic images don't just create them from scratch. Well-known examples such as MidJourney, Stable Diffusion or DALL·E learn by being supplied with massive datasets (images, captions, metadata, titles, etc.). This data is taken from a wide range of online sources, often via automated processes such as web scraping or other data acquisition methods. Once collected, this data is then organised and cleaned up for AI training purposes. This phase is when the AI learns what things are, to discern and distinguish between a cat, a dog, an artistic style, a colour or a texture, for instance. When it receives a request, the AI will analyse the words, compare them to its database and generate an image based on the matches it identifies.

When AI models are trained...using AI images

The impressive output of generative models continues to wow the public, and the internet is awash with so-called 'synthetic’ images. But like a snake biting its own tail, AI models are now being trained using an increasing proportion of synthetic images. Detecting such content within datasets has become a complex problem in itself. Researchers working on the next generation of generative models are faced with a problematic and challenging reality: because their training databases inevitably contain artificial content, they are therefore mixed datasets, meaning they contain both real and self-generated data. 

Does training using mixed datasets affect the performance of models? 

A number of papers have sought to answer this question. One such paper, AI models collapse when trained on recursively generated data” was published in the journal Nature. In it, researchers explored this phenomenon. The conclusion they reached was alarming: if the situation isn't addressed, then it could lead to the collapse of generative models. As a result of being fed synthetic data, the output of generative AI systems will become increasingly homogeneous, biased and error-prone.  By way of an illustration, imagine a situation in which a document is photocopied infinitely, each new copy losing slightly more of the quality of the original.

Figure 1 : Échantillons générés par l'EDM entraîné sur l'ensemble de données FFHQ. Le réentraînement itératif du modèle exclusivement sur ses propres données générées entraîne une dégradation de l'image (rangée supérieure).  En revanche, le réentraînement sur un mélange de données à moitié réelles et à moitié synthétiques (au milieu) donne une qualité similaire au réentraînement sur des données réelles (en bas).
Tiré de "ON THE STABILITY OF ITERATIVE RETRAINING OF GENERATIVE MODELS ON THEIR OWN DATA" Publié lors de la conférence ICLR 2024
Figure 1: Samples generated by the EDM trained on the FFHQ dataset. Iterative retraining of the model exclusively on its own generated data results in image degradation (top row). In contrast, retraining on a mixture of half-real and half-synthetic data (middle row) results in similar quality to retraining on real data (bottom row).

Collapse not inevitable

The paper co-authored by Quentin Bertrand adds an important caveat to the conclusions reached in the paper published in Nature. Experiments carried out using the most commonly used generative models have shown that training can remain stable, provided the initial model is sufficiently well-trained and that a sizeable chunk of the data used for training purposes is real as opposed to synthetic. The key is not to eliminate each and every synthetic image but rather to ensure there is a sufficient percentage of non-synthetic images. If the percentage of real images were to drop below a certain threshold, then there would be a risk of the AI collapsing.

In any event, human curation will be crucial to ensuring models remain stable. Most generative model interfaces generate two images for you to choose from. This human curation acts as a sort of “internal feedback” for the machine, guiding the model’s development. Essentially, each time you make a choice, you are manually guiding the AI, while indirectly correcting any possible errors. In order to maintain performance and get around the issue of the ever growing quantity of synthetic images, generative AI models must contain a sufficient percentage of real images and feature a human curation component to provide final feedback.

Human curation: a risk-free solution?

Unfortunately, curation also carries risks, including a reduction in cultural and ethnic diversity. If most users prefer a certain style of image (e.g. faces that are symmetrical with smooth, white skin) then the AI is more likely to produce that type of image. As for stereotypes, when an AI is asked to represent a profession where there is no distinction between male and female in terms of syntax (for example, “Imagine a CEO” or “Imagine a secretary”), the AI will generally represent the CEO as a man and the secretary as a woman.

The same goes for text-generating AI. If a model is retrained using answers that have already been optimised for popularity, there could be a risk of excluding certain styles of writing or less common expressions.

In conclusion, the continued rise of generative models will centre on their capacity to assimilate and reproduce real-world data. However, in response to the proliferation of synthetic content, these models must steer clear of excessive self-learning, which could result in homogenisation, a reduction in quality and, potentially, collapse.

But this worst-case scenario is certainly not inevitable. Quentin Bertrand’s research suggests that there are solutions: ensuring the proportion of real data does not fall below a certain threshold and integrating human curation as a safeguard. This will help to ensure that the content generated is diverse and relevant, while preserving the computational richness of the web. But one key ethical question remains: how can it be ensured that human curation respects humanity’s inherent cultural diversity?