The Feedback Loop We Can't Exit
What we're selecting for while we pretend we're still choosing
At Oxford, researchers fed a model its own outputs until it forgot cathedrals. Medieval architecture became training data became generated text became new training data. By iteration nine, when asked about flying buttresses, it wrote about jackrabbits with the confidence of a tech CEO explaining blockchain.
Not confused jackrabbits. Not jackrabbits as architectural metaphor. Just jackrabbits, presented with the confident authority of a system that no longer remembers what architecture means but remains certain it’s doing excellent work. The original signal had degraded so completely that the machine could no longer recall what topic it was supposed to be discussing.
This is model collapse. We’re building it into the internet in real time. And what’s most revealing isn’t the technical failure: it’s what we keep choosing to preserve while everything else degrades.
Model collapse works like this: AI systems train on their own outputs, and the unusual examples disappear first. The minority cases. The interesting stuff. Then the middle. Eventually everything converges toward bland centrality before the system starts hallucinating jackrabbits with the same confidence Silicon Valley brings to pronouncing “synergy.” Researchers named the phenomenon in July 2024.
We’ve known since the dawn of copying that copies degrade. That’s not news. What’s interesting is that we’re embedding this degradation into our information infrastructure, and we’re doing it faster than we built the infrastructure in the first place. By late 2024, researchers estimated that more than half of newly published web content originated from machine outputs. Before ChatGPT launched in November 2022, that figure was around five percent.
Two years. That’s all it took to go from mostly human-written internet to majority-synthetic. The anthropological question isn’t technical but cultural: what made us so comfortable flooding the information commons with machine outputs at a pace that would have triggered protests three years ago? No vote. No debate. Just a collective shrug while the infrastructure quietly transformed. The speed suggests we weren’t just ready. We were eager.
The implications for AI training are immediate and mathematical. AI companies train large language models on material scraped from the internet. The internet increasingly consists of content produced by large language models. Successor systems will train on a mixture of human and synthetic outputs, with the synthetic proportion climbing exponentially. The recursive loop has already begun.
Researchers at Rice University and Stanford coined the term “Model Autophagy Disorder” to describe this phenomenon, deliberately invoking mad cow disease. That’s not hyperbole. Prion diseases work because misfolded proteins cause other proteins to misfold, creating a chain reaction of degradation. Model collapse works because degraded outputs corrupt the systems trained on them. Imagine the press release: “We’re calling it Model Autophagy Disorder, which sounds better than ‘we taught machines to eat themselves.’”
Without fresh human input in each generation, quality and diversity erode. The math is simple. The implications aren’t. One doomsday scenario: if left uncontrolled for many generations, model collapse could poison the data quality and diversity of the entire internet.
The math explains why the industry started writing checks. In 2024, the major labs signed licensing deals with publishers at a pace that looked almost desperate. News Corp got over $250 million from OpenAI. Reddit got $60 million annually. The Financial Times, Time Magazine, Hearst, Vox Media, The Atlantic, and dozens of other publishers all inked agreements to provide their archives as training material.
The paradox, as Reddit’s CEO noted, is that as machine-generated content proliferates, “there is a rising value placed on content created by actual individuals.” Reddit’s pitch to AI companies was: “We have humans. Real ones. They argue about sandwiches at 3 AM. You can’t generate this level of authentic dysfunction.” Nearly twenty years of genuine dialogue, over a billion posts, sixteen billion comments. Human writing, human thinking, human messiness. The kind of material you can’t synthesize.
This is scarcity economics discovering that authenticity has a price tag with eight figures. We built machines that produce outputs at nearly zero marginal cost, then found ourselves paying hundreds of millions for content those machines can’t fabricate. The machines contaminate their own inputs. We are producing our own poison, then bidding against each other for the antidote. And what does it reveal about how we’ve always understood creativity: that our first move was to treat it as commodity, not expression?
It’s not just that the infrastructure degrades. It’s what degrades first.
The Nature paper is brutally explicit: early model collapse primarily affects minority data. The unusual cases. The underrepresented perspectives. The long tails of human experience that don’t fit the algorithm’s preferred pattern. Think of it like audio compression that removes the quiet instruments until all you hear is bass and drums: technically still music, functionally just rhythm. The subtle textures disappear first. The composition follows, until everything sounds like a corporate algorithm’s idea of diversity.
A 2024 paper from the FAccT conference documented this in stark terms. When generative models train on each other’s outputs through successive iterations, they converge to the majority. The models amplify their mistakes. Little information from the original distribution remains. Their experiments showed that this process can cause “complete erasure of the minoritized group.”
The mechanism isn’t complicated. Generative models learn probability distributions. The most common tendencies are easiest to learn and most likely to be reproduced. Rare signals are harder to capture and more likely to be dropped when the architecture simplifies. Train one system on another system’s outputs, and you’re already starting with a simplified distribution. The rare tendencies get rarer. The common tendencies get more common. After enough iterations, the minorities vanish. This is not a failure of the technology. It’s the technology working as designed. Generative models learn to reproduce tendencies. The question is which tendencies we preserve and which we erase. The answer, mathematically, is that the tendencies we have less of disappear faster.
What does it mean that we built infrastructure that formalizes “erase the minorities first” into its training architecture (not as bug but as statistical inevitability)? What cultural assumptions had to be in place for this to feel normal? The infrastructure doesn’t encode our biases. It clarifies them. It takes the ambient truth that some people matter less and translates it into loss functions we can optimize. When the model degrades, it’s just showing us what we already decided to preserve.
Which raises the question: if we understand the mechanism, why aren’t we stopping it? The obvious response would be labeling. The industry’s working on it. Google’s DeepMind released SynthID-Text. Various academic approaches have been proposed. Watermark everything AI-generated so we can filter it from training sets and preserve the human signal.
The problem is that watermarking doesn’t work reliably. Research from ETH Zürich demonstrated roughly eighty percent success rates for spoofing watermarks and eighty-five percent success rates for stripping them entirely.
More fundamentally, watermarking assumes everyone will cooperate to make detection possible. This is like assuming everyone will drive slower if we just make speed limit signs more visible. Technically true. Practically delusional. Profitable nonetheless.
Picture the industry consortium press conference: “We’re pleased to announce voluntary labeling standards that every AI company has agreed to implement, unless it affects their metrics, in which case they’ll implement it differently, or not at all, but in a way that looks like cooperation to regulators who aren’t technical enough to notice the difference. We expect full adoption by next quarter, or the quarter after that, or possibly never, but the important thing is we’re all committed to the principle of eventually doing something that resembles what we’re describing today.”
Companies that watermark their outputs create disadvantages for their own products. Platforms that filter watermarked material reduce their traffic. Users who care about authenticity are outnumbered by users who care about convenience. The incentive structures don’t align. Brookings’s assessment: watermarking might help identify the majority of AI-generated content, but it won’t solve the problem in high-stakes settings. It will, at best, slow the erosion. Which means we’re watching it happen, understanding why it’s happening, and choosing not to intervene.
We built machines that write, and now we’re discovering that human writing is irreplaceable. We built machines that generate content at scale, and now we’re paying premium prices for content they can’t generate. We built systems that learn from data, and now those systems produce the data that will teach future systems, and the signal is eroding.
The ouroboros metaphor is apt. The snake eating its own tail. But the classical image was a symbol of eternal return, of cycles that renew themselves. Ours doesn’t renew. It diminishes. Each cycle produces less diversity, fewer unusual cases, weaker signal. The snake doesn’t grow back what it consumes. It just gets smaller, hungrier, more certain that everything still tastes right.
Some researchers remain optimistic. If we preserve enough human-generated material and mix it carefully with synthetic outputs, we might avoid the worst outcomes. If we develop better methods for identifying and filtering AI-generated content, we might maintain information quality. If we slow down the race to train ever-larger models on ever-more-contaminated substrate, we might buy time. We’re not doing any of those things at scale. We’re doing the opposite. The licensing deals happened not because companies decided to prioritize data quality, but because they ran out of uncontaminated material to scrape. The research into model collapse happens alongside, not instead of, the training runs that will cause it. Companies ignore watermarking coordination while content generation scales exponentially.
The question that keeps returning is what we’re selecting for. If early deterioration erases minorities first, and late deterioration erases coherence, what survives the longest?
The blandly common. The aggressively average. Whatever appeared most often to begin with. The center of the distribution.
This isn’t just a technical problem about AI training. It’s a mirror. The infrastructure we’re building selects for the things we’ve already seen the most of. It amplifies the dominant frequencies and erases the marginal ones. It learns from what exists and reproduces more of it, while the unusual, the rare, the genuinely new becomes harder to find in the signal. When we call this “collapse,” we’re describing the machinery. When we ask what it reveals about us, we’re asking which humans we decided matter.
We could build it differently. We could maintain curated datasets of human-generated material, preserved as training baselines. We could develop reliable provenance tracking so synthetic content can be identified at scale. We could coordinate across labs and platforms and nations to prevent contamination before it spreads. We could value diversity in training inputs as much as we value scale.
We could do those things. The fact that we’re not doing them is the answer to a question we haven’t asked yet: What kind of future are we building when we formalize erasure into mathematics and call it optimization?
The feedback loop is already running. The internet is already majority-synthetic. The training material for successor models is already contaminated. Whether we call it model collapse or model autophagy or the ouroboros effect, the pattern is visible now. And visibility doesn’t stop it. Seeing what’s happening doesn’t change the incentives.
We’re not building toward the jackrabbits. They’re already here. The models grow more confident about a narrower range of outputs. The minorities are disappearing from the training distributions. The signal is eroding while the volume increases. And we’re training successor systems on this contaminated substrate because the economics reward speed and the competition rewards scale and the platforms reward engagement.
The question isn’t whether we can exit the feedback loop. The question is what we’re selecting for while pretending we’re still choosing. When the jackrabbits arrive (confident, coherent, disconnected from the architecture they were supposed to describe), we’ll probably train the next model on them too. Because by then, we won’t remember what cathedrals looked like, and we’ll have forgotten why we should care. The technology will have done exactly what we asked: optimized away everything we claimed to value but weren’t willing to protect.
Research Notes: The Feedback Loop We Can't Exit
Started researching AI model degradation expecting a future threat. Found something worse.







