The Confidence Gap
What AI Code Generation Reveals About Expertise
The Stack Overflow question arrived like a ghost in the machine: how do I debug code I never wrote? Not “how do I debug this function I inherited” but “how do I debug what the machine generated when I whispered instructions into its ear?” Eighty-three responses offered technical solutions: breakpoints, logging, binary search through the call stack. A chorus of solutions to the wrong problem. The real question: what happens when “I wrote this code” becomes “I prompted this code,” when authorship transforms from creation to curation?
This isn’t hypothetical. 83% of organizations now use machine learning to produce software. Security leaders are explicit about the risks: 92% express concern about incidents. Then they head back to their companies where engineers are prompting models to output functions they’ll review briefly before merging to main.
The gap between concern and behavior isn’t hypocrisy. It’s a collective decision to treat expertise as dead weight and hope the consequences land elsewhere: on the security team, the scanning tools, the vendor, the insurance company, the next person to touch this codebase. Programmers work in environments where rapid deployment beats deep understanding, where the test suite passing counts as certainty, where “the AI wrote it” is becoming an acceptable answer to “how does this work?”
The relationship between competence and confidence is decoupling in ways that reshape what it means to call yourself a developer.
The Automated Confidence Machine
A 2023 study found developers using AI assistants wrote less secure code but believed it was more secure. Less skill, more confidence, a feedback loop where the system manufactures expertise without the substance. The pattern resembles the Dunning-Kruger effect (though that research has replication issues, the phenomenon it describes is real enough). We’re witnessing the industrialization of incompetence, where machines produce the feeling of mastery faster than humans can develop the actual thing. The confidence gap is no longer a cognitive bias; it’s a product roadmap.
These platforms don’t just produce software. They manufacture feelings about software. They create the subjective experience of mastery independent from its presence. A prompt goes in, the system responds, tests pass, dopamine hits. The gap between “I feel like I grasp this” and “I actually understand this” widens, but the feeling is what registers. The feeling is what gets shipped.
Professional identity reveals something troubling when competence and the feeling of competence separate so cleanly. An entire industry optimizes for the sensation of mastery while expertise quietly atrophies. Self-worth is being redesigned: you’re valuable because you ship fast, not because you understand deeply. The market has spoken. Understanding is overhead.
The trick is that the person buying it (a CTO) and the person experiencing the cognitive shift (the engineer) are different people with different incentive structures. One wants speed metrics. The other wants to remain employed. The confidence machine serves both needs perfectly.
In April 2023, Samsung employees pasted proprietary semiconductor code into ChatGPT. Three times. Samsung banned ChatGPT entirely after the incident.
Consider what had to become normalized first. The idea that “let me check this with the internet chatbot” and “let me review my own work” occupy the same category of professional behavior. That a publicly accessible model trained on scraped internet data by a company with completely different incentives than yours becomes functionally equivalent to your own expertise.
They were taking the path of least resistance.
The question is what happened to professional identity such that this became that path. External validation from a language model became indistinguishable from internal comprehension. The boundary dissolved without anyone noticing it had gone.
The Theater of Due Diligence
The Samsung incident reveals a broader pattern visible in any organization adopting these platforms. The meeting room becomes a stage where corporate Kabuki theater unfolds. The engineering lead performs confidence, the security engineer performs caution, the CTO performs oversight. Everyone knows their lines, everyone knows the performance is the point. This isn’t about preventing problems; it’s about distributing responsibility so thinly that when something breaks, no single person can be blamed. The goal isn’t safety. It’s defensible failure.
“We’re adopting AI coding assistants to increase velocity,” says the engineering lead.
“What about security?” asks the CTO, because someone has to ask.
“We’ll treat output as untrusted data,” says the security engineer. Everyone nods seriously, as if this solves something.
Picture this formalized six months from now: “Chief Untrustworthy Code Officer” becomes an actual job title. The CUCO reviews implementations explicitly categorized as suspicious. Their entire function is applying human judgment to code the company doesn’t trust but ships anyway.
Nobody laughs. Nobody points out they just invented a framework where the tool they trust to write software produces output they’ve formally categorized as untrustworthy. But the meeting ends and the tools get deployed because everyone in the room knows the actual function of this meeting isn’t to prevent risk.
It’s to distribute it.
The security team establishes that they warned everyone. Engineering hits speed targets. The CTO makes a liability decision.
Six months from now when something breaks, someone will pull up these meeting notes. The security team will say “we flagged this risk.” Engineering will say “we implemented the recommended controls.” The CTO will say “we followed standard procedures.” And they’ll all be correct. That’s what makes it theater rather than hypocrisy: it’s a collaborative performance where everyone knows their lines.
The security engineer who writes the “untrusted code” framework is doing the job they’ve been given.
Make this decision defensible, not safe.
The Shopify Exception That Proves the Rule
This performance isn’t universal. When people talk about “doing AI coding right,” they point to Shopify. The company integrated these tools while implementing comprehensive frameworks: DevSecOps practices, manual review, continuous monitoring, threat modeling, behavioral analysis.
This is presented as the solution. It’s actually revealing the problem.
Shopify’s approach requires significant resources. Dedicated security teams. Custom infrastructure. Continuous human oversight. Sophisticated monitoring systems.
Most firms don’t have these resources. So when industry leaders hold up Shopify as “how to do it safely,” they’re inadvertently revealing that doing it safely is expensive enough to be a luxury good.
Safe automated code generation becomes a class marker. Like organic groceries or private school: something people with resources can purchase and people without resources can watch others purchase while making different tradeoffs. Well-resourced enterprises can afford to do it carefully. Everyone else is rolling out machine-generated implementations with whatever scanning tools came bundled with their IDE. They hope automated analysis catches what human review won’t, because human review takes time nobody’s allocating.
The Shopify case study doesn’t demonstrate that AI coding can be done safely at scale. It demonstrates that the resource threshold for doing it safely is high enough that most organizations are, by definition, doing it unsafely.
And they know it.
They’re doing it anyway because the alternative means explaining to the board why pace is lower, costs are higher, and the engineering team needs more headcount to deliver the same features. Not adopting these platforms while competitors do means explaining these gaps in the Q3 board meeting when the CFO is examining why the company missed growth targets.
The question of whether you can afford to be safe becomes irrelevant when being safe means you can’t afford to compete.
The economics create a specific bind: organizations can afford to be safe, or they can afford to compete.
For most, that’s not really a choice.
What “Writing Code” Means Now
This economic reality reshapes daily practice. The constraints don’t just change behavior, they reconstruct what “developer” signifies as professional identity.
Let’s talk about the dream. Not the conference version where everyone performs competence, but the 3am version that wakes you in a cold sweat.
A pattern emerges.
Not everyone experiences this, but enough do that it’s significant.
There’s a recurring dream. The details vary, but the structure stays consistent. A developer is writing software. The system suggests a completion. They accept it. It works. Another suggestion. Accept. It works. They’re moving faster than they’ve ever moved. They’re productive. They’re valuable. They’re necessary.
Then they look at what they’ve built and don’t recognize it.
They wrote it (their fingers accepted each completion) but can’t explain how it works. They’re the author but not the architect. Responsible but not knowledgeable. And everyone around them is doing the same thing. Nobody comprehends any of it. But it’s all running. The tests pass. Everything works. Everything ships. The velocity metrics look fantastic.
And they wake up unsure which part was the nightmare: that nobody understands the software, or that it doesn’t matter because everything still functions perfectly and the business is thriving and their performance review was excellent.
The definition of authorship has split. “I wrote this code” used to mean: I grasped the problem, I architected a solution, I implemented it, I can explain how it works, I can debug it when it fails.
“I wrote this code” increasingly means: I described what I wanted, a model synthesized an implementation, I reviewed it briefly, the tests passed, I shipped it.
These are not the same activity. But they produce the same output (applications that run) so they get treated as equivalent. The professional identity of “developer” can stretch to accommodate both because the visible result looks identical even though the knowledge state and capability are fundamentally different.
This isn’t happening to us. It’s something we’re participating in, often willingly, because the alternatives are worse. Insisting on fully understanding everything before release means watching productivity metrics decline. Refusing to use these platforms means explaining to a manager why a feature requires three weeks when a colleague delivered it in two days.
Or: prompt the system, review the output quickly, ship it, and collect the same salary while building different skills.
Skills that might not transfer to the next paradigm but are extremely valuable right now.
Accepting machine-generated implementations without full comprehension produces a specific vertigo. Scan it. It looks reasonable. The tests pass. Merge it. And there’s this moment (brief, barely noticeable) where you decide feeling uncertain is fine. It’s professional dissociation, like watching yourself work from outside your body. The part that deploys software and the part that grasps software are becoming separate people.
Daily work now requires splitting yourself in two. The identity of “developer” no longer correlates with understanding development. Professional self-worth derives from shipping velocity while actual mastery slowly erodes. This is the psychological architecture of an entire profession reshaping itself in real time.
The tools can’t explain their reasoning. Brief review might catch issues. And you’re hoping it will. You’re practicing faith in automation while performing competence for management. The gap between who you present as and what you actually know widens every sprint.
These aren’t problems to solve. They’re conditions we’re already living in.
What This Reveals About Value
The revolution in automated coding isn’t interesting because machines can produce software. It’s interesting because of what the adoption pattern reveals about what organizations actually optimize for.
Engineering teams optimize for deployment over comprehension, for speed over expertise, for the feeling of productivity over the development of skill.
This isn’t secret. “I prompted a model and rolled out what it synthesized” is becoming an acceptable answer to “how does this work?” because understanding takes time that could be spent delivering features.
The confidence gap (developers feeling more skilled while becoming less so) isn’t an unfortunate side effect.
It’s instrumentally useful. It erodes expertise while making that erosion feel like progress.
Tools that make people feel productive without developing underlying mastery solve a business problem: they reduce the cost of software development by reducing the expertise required to deliver.
Training engineers is expensive. Learning takes time. Deep comprehension slows velocity. Automated coding platforms offer a different trade: faster release cycles, lower costs, less training investment, more “productivity” (measured by features delivered, not by sustainable velocity or technical depth).
The security implications are real.
They’re also externalities.
Someone else’s problem. Tomorrow’s problem. The insurance company’s problem.
Organizations know this. Security leaders can articulate the risks clearly. What they can’t do is change the incentive structure that makes those risks acceptable. They can document concerns. They can implement scanning tools. They can write policies. What they can’t do is make deployment slower or more expensive when competitors are shipping faster. Changing that structure would require something rare: organizations choosing long-term resilience over quarterly velocity, even when that choice looks like competitive disadvantage.
This creates a predictable pattern. Companies adopt automated coding assistants knowing the risks. They implement enough security theater to demonstrate reasonable care. Then they optimize for velocity until something breaks badly enough that security becomes briefly prioritized.
Then the cycle repeats.
That pattern is stable because the incentive structure is stable. And the incentive structure is stable because it’s working for the organizations making adoption decisions.
The Part Where You Can’t Look Away
Open the IDE tomorrow. Look at the automated assistant integrated into the environment: GitHub Copilot, or Cursor, or whatever platform the organization adopted to increase throughput.
Watch what happens.
Start typing. It suggests completions. Sometimes a line. Sometimes entire functions. Grayed out, waiting. Ghosted suggestions haunting the cursor. Tab to accept. The implementation appears. Tests pass. Ship it.
This is the job now. These tools didn’t create this pattern. They made it efficient enough to become the default.
That Stack Overflow developer asking how to debug software they didn’t write? They got their answer. Eighty-three responses taught them how to debug without understanding. How to add breakpoints to logic they didn’t design. How to log functions they couldn’t have written.
The responses were helpful. They solved the immediate problem. Nobody mentioned that they were collectively normalizing a profession where “I can debug what I don’t grasp” has replaced “I grasp what I built” as the core competence.
Because that’s already normal. That Stack Overflow question wasn’t asking “is this okay?” It was asking “how do I do this better?”
The debate about whether we should ship implementations we don’t fully understand is over. It ended without anyone noticing it had started.
The IDE is showing what work looks like when expertise becomes optional. When deep knowledge becomes overhead. When the job is evaluating and deploying, not understanding and building.
Nobody planned this. It emerged from everyone making locally rational choices. The tools on the screen are an X-ray making visible what was always there: the gap between what professionals tell themselves about their work and what their work actually is.
Tomorrow morning, IDEs across the world will light up with grayed-out suggestions, ghost code haunting the cursor. We’ll tab-accept because the alternative is explaining why we chose understanding over velocity. The system is working exactly as designed, not just the AI assistant, but the entire apparatus that made expertise optional. We’re debugging our relationship with knowledge itself.








A very comprehensive piece on AI generated code security with a lot of actionable insights. Thank you!