The Safety Paradox: How AI's Defenders and Critics Both Keep It Unregulated
When everyone agrees something is dangerous, but disagreeing about which danger turns out to be very convenient
When humans face threats they can’t quite name, we schedule meetings about them. This isn’t cynicism. It’s anthropology in real time. Watch what we do when confronted with technology that might reshape power, eliminate jobs, amplify discrimination, or (the popular framing) extinguish humanity. We gather representatives from twenty-eight countries at historically significant locations. We produce declarations. We commit to continued dialogue. We call this progress. We’re wrong.
Bletchley Park has now hosted two gatherings about dangerous technology. The first time, in the 1940s, they broke the Enigma codes and shortened World War II. The second time, in November 2023, twenty-eight countries agreed that AI poses urgent risks and committed to continued dialogue about them. You can see why they chose the same location. Both involved codes. Both involved international coordination. Both addressed existential threats. Only one produced actionable outcomes.
What makes AI safety debates fascinating isn’t the policy. It’s what our approach reveals about collective threat processing. We’ve evolved a sophisticated mechanism for performing concern while preventing action—a political immune system that attacks meaningful regulation while accepting antibodies of meaningless declarations. Both existential risk advocates and current harms activists are completely right. Both positions produce the exact same outcome: nothing happens. This isn’t an accident. It’s design. And what’s remarkable is how perfectly it reveals what we’re willing to confront versus what we’d prefer to defer.
What Dual-Track Deferral Reveals About Acceptable Futures
Here’s the mechanism. Regulators focus on current harms: surveillance tools misidentifying defendants, hiring algorithms screening out disabled applicants, credit systems encoding historical redlining. The response: “But what about existential risk? What about the threat that could eliminate humanity?” The frame shifts. Present to future. Documented to hypothetical. Regulatable to requiring more research.
When regulators focus on existential risk, the response flips: “We need these systems to solve bias, address climate change, cure disease. Restricting development abandons billions who could benefit.” The frame shifts again. Future to present. Hypothetical to humanitarian. Restriction to acceleration.
Both moves work. Both prevent action. Systems continue.
This reveals something uncomfortable. Distant apocalypse from superintelligence is politically palatable in ways that confronting algorithmic discrimination is not. Existential risk affects everyone equally. Billionaire and street sweeper face the same hypothetical doom. The apocalypse doesn’t challenge the social order.
Confronting algorithmic discrimination requires someone losing status. Someone surrenders accumulated advantage. It requires admitting that systems sorting humans into worthy and unworthy work exactly as intended. And that the people debating fairness are disproportionately the ones who benefited.
Tech companies have become extraordinarily sophisticated at this game. They fund longtermist think tanks concerned about superintelligence risk. Meanwhile they deploy current systems that amplify discrimination. They champion AI ethics boards focused on future alignment. Meanwhile they fight present-day regulation. They warn about existential threats at summit podiums. Meanwhile they lobby against transparency requirements for systems operating today.
This isn’t hypocrisy. This is strategy. Both positions address different timelines. They never conflict. You can be deeply concerned about AGI risk in 2035 while unconcerned about facial recognition accuracy in 2025.
We’d rather imagine our extinction than imagine wealth redistribution.
The Category Error as Avoidance Mechanism
We use the word “bias” like a Rorschach test: everyone sees what they need to see. The confusion isn’t accidental. It’s load-bearing for our collective ability to avoid choosing.
Mathematical bias is what algorithms do: find patterns in data, optimize toward outcomes. An algorithm trained on historical data will reproduce those patterns. This isn’t error. This is function.
Social bias is discriminatory effects we want to prevent. Algorithms produce outcomes that disadvantage marginalized groups. We call this bias. We demand it be eliminated.
The problem: these are often the same thing. An algorithm accurately predicts risk based on historical patterns. It’s mathematically optimized. It produces discriminatory outcomes. Both are true simultaneously. We’ve created a category error. Accuracy we find uncomfortable gets labeled as error requiring correction.
Take insurance redlining. An algorithm predicts higher risk for disinvested neighborhoods. Infrastructure correlates with risk. The pattern is real because of historical discrimination. The zip code encodes centuries of racist housing policy. Using it perpetuates that discrimination, even when technically accurate.
This reveals what we’re actually avoiding. The unfairness isn’t just in the algorithm. It’s in the reality the algorithm accurately describes. We’ve constructed an impossible regulatory standard. We want algorithms to be accurate but not too accurate. Find patterns but not those patterns. Optimize outcomes but not produce outcomes we dislike. What we actually need is redistribution. Addressing unfair reality means redistribution. Addressing unfair algorithms means more research funding. One threatens everything. One threatens nothing.
This is why the category confusion persists. It’s extraordinarily useful for preventing regulation. Any proposed bias standard becomes technically unfeasible. Or socially impossible. Or philosophically incoherent. The regulatory conversation shifts from “should this system exist?” to “how do we make this system fair?” The fairness question proves unanswerable. Deployment continues pending the solution to an unsolvable problem.
We’d rather debug the algorithm than debug the society. Mathematics feels solvable. Resource redistribution feels impossible. This isn’t policy failure. This is psychological preference made regulatory structure.
Summits as Collective Ritual
The Bletchley Declaration isn’t a failure. It’s working exactly as designed, as a performance of concern that enables inaction.
Consider what actually got accomplished. Twenty-eight countries agreed AI poses risks. They committed to information sharing and continued dialogue. They established no binding regulations. No enforcement mechanisms. No restrictions on current deployment. The declaration focuses on “frontier AI”, systems that don’t quite exist yet, rather than systems operating today.
We’ve evolved summits as our collective response to existential-scale questions. We’ve gotten remarkably good at institutional performance that signals seriousness while ensuring nothing changes. Entire categories of ritual that demonstrate we’re addressing this without actually constraining anyone.
This mirrors climate summits. Decades of declarations and commitments have produced continued dialogue about the urgent need for action. The summits absorb political pressure by transforming it into communiqués. The communiqués signal concern without compelling change. Even regulatory “success” produces performative compliance. GDPR threatened meaningful penalties. It produced consent-theater cookie banners that change nothing about data collection.
We’ve optimized for the appearance of governance rather than for governance itself. The consultation process becomes the answer. The framework for governance becomes the governance. The working group discussing whether we should is the decision that we did.
The Diversity Frame as Structural Avoidance
One specific claim appeared in every summit discussion: tech companies lack diversity, and this contributes to AI harms. This is true. Completely insufficient.
Treating diversity as the solution reveals which explanations we find comfortable. The frame implies current AI harms are unintentional: blind spots rather than working-as-designed. This erases the possibility that algorithmic discrimination serves interests by enabling profitable sorting. That surveillance tools are deployed precisely because they enable control. That automated decision-making is valuable specifically because it obscures accountability.
Adding diverse voices to teams building surveillance tools doesn’t make surveillance less harmful. It makes surveillance better. More inclusive. It misidentifies everyone with equal precision. A diverse team catches that facial recognition fails on darker skin tones. The solution: improve the algorithm so it tracks everyone fairly. Problem solved. Progress achieved. The surveillance remains. It’s just more equitable surveillance now.
Companies embrace diversity commitments while fighting regulation. Workshops change nothing about the business model. Regulation would. The diversity frame makes symptoms feel like causes. Comfortable. Actionable. No need to ask structural questions we don’t want to answer.
The Incentive Architecture of Perpetual Study
So whose interests are served by keeping AI safety debates locked in this paradox? Everyone’s. That’s the problem.
Companies benefit from dual-track deferral. Regulation stays perpetually pending while deployment continues. First-mover advantage compounds while we debate frameworks. By the time we agree on governance, the systems are embedded infrastructure. The question shifts from “should these exist?” to “how do we manage these?”
Governments signal responsiveness without requiring politically costly choices. Politicians point to summits. They don’t constrain industry. They don’t risk innovation flight. The appearance of action substitutes for action.
Academic researchers depend on open questions requiring study. If you ban facial recognition in policing, you eliminate funding for research into making it fairer. The perpetual research agenda requires never resolving whether these systems should exist.
Civil society organizations need problems to advocate about. Not problems solved and unfundable. The incentive structure favors improving systems over questioning whether systems should exist.
And it serves the idea that technology problems have technology solutions. That sufficient research produces adequate governance. This suggests problems are solvable through expertise rather than requiring political struggle over resource distribution.
The incentive alignment is almost perfect. Every participant benefits from keeping AI safety in eternal debate. This isn’t conspiracy. This is collectively evolved behavior. Individual incentives align toward the same outcome. No one has to coordinate.
We’ve mastered performing concern while preventing action.
The Language of Inevitability as Accountability Erasure
Here’s what’s missing from AI safety summits: any discussion of whether these systems should exist at all.
Algorithmic hiring tools automate discrimination at scale. Predictive policing amplifies existing enforcement patterns. Facial recognition enables surveillance infrastructure. These questions don’t get asked.
They threaten the premise. The frame is always “how do we make AI safe?” not “should we deploy this capability?” Safety implies the thing is happening and needs safeguards. It forecloses the question of whether the thing should happen.
AI development gets treated as an inevitable force requiring management rather than a series of choices requiring justification. “Progress is accelerating” becomes description of nature. Not description of investment decisions. The language of inevitability serves those making choices by suggesting no choices are being made.
The investor decks are illuminating. “AI adoption is accelerating across industries.” Not “we’re selling this to every industry.” “Organizations that fail to adapt will be left behind.” Not “we’re creating the conditions where you have no choice.” “The future is already here.” Not “we’re building it and didn’t ask permission.” The passive voice does extraordinary work. Things are happening. Progress is being made. The future is arriving. Nobody’s driving. Nobody chose this. It’s just... occurring.
But every AI system represents decisions. Decisions about what problems are worth solving with automation. Decisions about whose interests matter. Decisions about what metrics define success. These are political decisions dressed as mathematics.
When AI gets treated as inevitable, those decisions go unexamined. Arguments emerge about bias in hiring algorithms without asking whether hiring should be algorithmic. Debates about fairness in risk assessment proceed without asking whether risk should be assessed this way.
We’ve evolved language that erases agency from deployment decisions. We’d rather have algorithms deny someone’s loan application than have a loan officer explain why. We’d rather automate injustice than admit we’re choosing it. The algorithm didn’t decide. It optimized based on patterns. No one’s responsible. Everyone’s hands are clean.
The safety framing keeps everyone perpetually downstream of the deployment decision. Arguing about how to make existing systems better rather than whether they should exist. Very convenient for those building and deploying them.
What We’ve Mastered
Here’s what an intellectually honest AI safety summit would acknowledge:
We know how to regulate algorithmic systems causing harm today. We have legal frameworks. Precedent. Technical methods. What we lack is political will. Regulation conflicts with corporate interests, governmental convenience, academic funding.
We don’t know how to prevent hypothetical existential risk from AGI. We don’t know if AGI is possible, on what timeline, with what properties. Focusing on this distant uncertain threat over present certain harms is a choice that serves specific interests.
Both framings create impossible regulatory standards. The first because we can’t regulate what doesn’t exist. The second because we can’t solve social problems through mathematical intervention. Both benefit those deploying AI now by deferring action indefinitely.
The summit format itself is safety theater. Performance of concern that absorbs political pressure and produces communiqués rather than constraints.
That’s the honest version. It didn’t appear in the Bletchley Declaration. The summit structure. The working group format. The research agenda that requires perpetual study. The diversity conversation that treats symptoms as causes. The language of inevitability that erases agency. The dual-track deferral that keeps everyone arguing about which timeline matters while deployment continues on all of them.
What does our collective willingness to accept this arrangement reveal? When technology serves powerful interests, complexity suddenly appears. More research becomes necessary. We’ve optimized for consultation theater because it provides something democracy can’t: the appearance of participation without the risk that participants might actually decide something we don’t want.
The AI safety debate isn’t really about safety. It’s about power. Who has it, who wields it, who benefits from systems that automate and scale it, and who bears the costs. Framing it as a technical challenge requiring expertise obscures that this is a political question requiring democratic accountability. But treating it as political would require acknowledging we’re making choices. Treating it as technical allows us to pretend we’re discovering necessities.
The paradox isn’t a bug. It’s the system working. It’s us, revealed.
What the Mechanism Produces
The pattern doesn’t require conspiracy. It requires aligned incentives that make the same outcome profitable for everyone involved.
The summit schedule extends through 2029. Singapore emphasizes coordinated research pathways in 2026. Mexico City declares progress on frameworks in 2027. Johannesburg celebrates a decade of international cooperation in 2029, where representatives will reflect on productive dialogue and commit to continued coordination. The 2035 summit, tentatively scheduled for Dubai, will focus on emerging governance challenges from AI systems that have by then been embedded infrastructure for a decade. The agenda includes robust discussion of whether anything should have been done earlier, with breakout sessions on documentation best practices for future retrospective analysis.
Meanwhile, algorithmic systems continue making decisions about credit, employment, housing, criminal sentencing, and healthcare access exactly as they did before. The summits change nothing about their deployment. Everyone can now point to the frameworks when pressed about accountability.
Someone keeps organizing these summits. Someone keeps funding the research agendas that recommend more study. Someone keeps deploying the systems while the frameworks are developed. Someone benefits from the arrangement where the consultation process substitutes for constraint.
This is what accountability looks like when nobody can afford to be held accountable.
This is what governance looks like when the governed have no mechanism to withdraw consent. The summits continue not because anyone decided this outcome, but because everyone’s individual choices compound toward it. The next one is already scheduled. The systems being discussed are already deciding. And we keep showing up to meetings where the question of whether this should be happening has been quietly removed from the agenda.
The future we’re building doesn’t require our consent. It just requires us to keep attending consultations about it until the question expires from exhaustion. We’re not participants in this process. We’re alibis.









I'm torn on this one. We need to start with the existential and stop that first. Because if we start with everything listed, literally nothing will happen.
If we are successful on equal benefits, elimination of bias, and robust regulation, we likely won't have anything left because it doesn't reflect reality.
Take this one for instance: Amplifying Unfairness and Discrimination.
This might be true and I hear it a lot but where? What company would accept an outcome that discriminates with zero thought? Take home loans and redlining. We say this risk assessment is bad. Yet insurance companies do this all the time especially in places like Florida. There are insurance companies who refuse to insure houses in redlined areas of Florida for the same risk reasons as home loans.
But if the bias results in an outcome you don't want that's not bias per se but accuracy.
But a regulatory body for algorithmic bias? What about the fact that an algorithm is mathematical bias? It takes a large volume of data, finds patterns and reduces it toward an outcome.
The bigger issue is we throw around the term bias without understanding how many layers of different biases exist in these systems. (See article below in eliminating bias in AI/ML)
On the one hand we poo poo looking at existential threats and so do nothing while in the other hand we focus on a million problems that are utipic and so do nothing.
https://www.polymathicbeing.com/p/eliminating-bias-in-aiml