I use AI tools for most of my work. As CTO of a YC-backed startup, I’ve spent the past two years building with Cursor, Claude, and Copilot. I’ve written about how AI SRE tools fail at causal reasoning, how AI in healthcare doesn’t improve diagnostic accuracy, and how AI coding tools make experienced developers slower. Those three articles were supposed to be about different domains. They turned out to be about the same thing.
In every case, the AI was individually capable. The collaboration was where it broke down. Doctors don’t improve with AI even though the AI alone scores higher. Developers think they’re faster when they’re measurably slower. SREs lose the instinct they need most when automation fails. The pattern kept repeating, and I wanted to understand why.
I spent the last few weeks pulling together cognitive science research, controlled experiments, neuroscience studies, and first-hand accounts from developers and workers. What I found is that the dominant model of human-AI interaction — AI generates, human reviews — is fundamentally misdesigned for human cognition. Not because AI is bad at generating. Because humans are bad at reviewing without creating, and they lose capability in the process.
This article lays out the evidence, explains why it happens, and proposes a different model.
Your brain on AI
An MIT Media Lab study used EEG monitoring to measure neural engagement during writing tasks. ChatGPT users showed the lowest engagement across all 32 measured brain regions compared to people using Google or writing unaided. By the third session, most LLM users were copy-pasting rather than thinking through the material. 83% couldn’t accurately recall key points from their own essays.
That finding alone should give us pause. But it’s consistent with everything else coming out of the research labs.
A Wharton study published in PNAS ran a randomized controlled trial with roughly 1,000 Turkish high school students. Students given unrestricted ChatGPT-4 access improved their practice scores by 48%. When the AI was taken away, they performed 17% worse than students who never had access at all. The researchers called this “cognitive debt” — a term borrowed from the concept of technical debt. You get something fast now and pay for it later.
The most interesting part of that study: a version of ChatGPT designed to give hints instead of answers eliminated the cognitive debt entirely. The tutor group improved 127% during practice and still scored the same as the control group on the exam. The learning transferred because the difficulty was preserved. The students had to do the thinking. The AI just pointed them in the right direction.
Anthropic ran a similar experiment with 52 software engineers learning an unfamiliar Python library. The AI group finished about two minutes faster — not statistically significant. On a comprehension quiz, they scored 50% versus 67% for the manual coders. The largest gap was in debugging questions. The people who relied most on AI delegation scored below 40%. The people who used AI only for conceptual questions scored 65% or higher.
The debugging gap is the part that kept me up. Debugging is exactly the skill you need to oversee AI-generated code. The tool that’s supposed to make you faster is eroding the specific ability you need to use it safely.
Microsoft and Carnegie Mellon studied 319 workers and found that greater trust in AI correlated with reduced critical thinking and less diverse solutions. A Swiss Business School study found a significant negative correlation between AI usage and critical thinking scores, with the strongest effect among people aged 17-25 — the ones building their cognitive foundations.
The brain fry problem
In March 2026, BCG and UC Riverside published a study of 1,488 full-time U.S. workers that introduced the term “AI brain fry.” About 14% of workers reported it: mental fog, difficulty focusing, slower decision-making, headaches. One worker described it as “a dozen browser tabs open in my head, all fighting for attention.”
The performance numbers were worse than I expected. Workers with brain fry were 39% more likely to make major mistakes and 11% more likely to make minor errors. 34% of them intended to quit, compared to 25% of workers without the condition. The optimal number of AI tools was two. After three, productivity scores dropped. Using four or more led to 14% more mental effort, 12% more fatigue, and 19% more information overload.
The researchers described a cognitive shift from “carpentry” to “air traffic control.” Instead of building things, you’re monitoring multiple AI outputs, evaluating suggestions, context-switching between tools, and maintaining vigilance for errors you didn’t create and may not fully understand. That kind of sustained monitoring is, as far as cognitive science can tell, one of the things humans are worst at. Vigilance research going back decades shows that monitoring performance degrades after about 20 minutes. The entire safety case for AI in coding, medicine, and operations assumes humans can do it indefinitely.
The work got harder, not easier
If AI is making us more productive, we should have more free time. A UC Berkeley team tracked 200 employees at a tech company for eight months and found the opposite. Workers “worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day, often without being asked to do so.” By month six, burnout spiked. The actual time savings from AI? 3%.
One worker put it plainly: “You had thought that maybe you save some time, you can work less. But then really, you don’t work less. You just work the same amount or even more.”
The Harness report from March 2026 quantified this among developers. 96% of very frequent AI coding tool users work evenings or weekends multiple times a month, compared to 66% of occasional users. The heaviest AI users also reported longer incident recovery times: 7.6 hours versus 6.3 hours. TechCrunch ran a piece with the headline: “The first signs of burnout are coming from the people who embrace AI the most.”
HBR titled their analysis “AI Doesn’t Reduce Work — It Intensifies It.” They identified three mechanisms: task expansion (workers absorb roles from other departments), erosion of natural breaks (AI eliminates pauses between tasks), and the multitasking trap (managing multiple AI-enabled workflows increases context-switching). None of this was mandated by management. Workers did it to themselves because the tools made it possible.
I wrote about this same dynamic in AI SRE. The Catchpoint SRE Report found median reported toil jumped from 20% to 34% in 2026. AI didn’t eliminate toil. It redistributed it. The new categories: maintaining AI tools, reviewing AI suggestions, tuning prompts, checking AI actions, explaining to others what the AI did.
AI is eating the work that makes work meaningful
This is the finding that reframed everything else for me.
A 2026 study in Scientific Reports ran an experiment with 269 participants plus a follow-up survey of 270 workers. They tested three conditions: no AI, passive AI use (copying AI-generated content), and active collaboration (human writes a draft, then AI helps refine it).
Passive AI use undermined three things: self-efficacy, psychological ownership, and work meaningfulness. These effects persisted even after participants returned to manual work. The initial boost in enjoyment and satisfaction from passive AI use reversed once people went back to doing things themselves. Active collaboration — where the human creates first and the AI refines — preserved all three psychological factors. The outcomes were comparable to working without AI at all.
The order matters. Human first, then AI: fine. AI first, human edits: damaging. This is the opposite of how most AI tools are designed.
A separate study surveyed workers on 10,131 computer-assisted tasks across U.S. occupations and found that tasks associated with a sense of agency and happiness are disproportionately exposed to AI automation. The paper’s title was “Are We Automating the Joy Out of Work?” The answer, based on the data, appears to be yes.
I keep thinking about what Business Insider reported: top engineers at Spotify haven’t written code since December 2024. The article framed this as an identity crisis. Engineers who spent years developing mastery through struggle are now watching AI do the part they found meaningful. One question kept surfacing: “If I’m no longer the person who writes the code, who am I?”
There’s a useful distinction from Self-Determination Theory here. Humans have three basic psychological needs at work: autonomy (control over what you do), competence (the feeling of mastery), and relatedness (connection to others through shared creation). AI as currently implemented threatens all three. You shift from author to reviewer. Mastery requires struggle, which AI bypasses. And it’s hard to feel connected to work you didn’t create.
Hannah Arendt drew a distinction in 1958 between “labor” (cyclical, repetitive, immediately consumed) and “work” (creating something durable that outlasts the maker). A 2025 analysis in the Journal of Business Ethics applied her framework to AI and concluded that cognitive automation is reducing “work” to “labor” — from building to monitoring. When AI generates and humans review, humans do labor, not work. Arendt warned of “a society of laborers without labor.” What we’re building is closer to a society of workers forced into labor.
The deskilling evidence is now cross-domain
In 1983, cognitive psychologist Lisanne Bainbridge published “Ironies of Automation,” arguing that when you automate most tasks, the remaining hard tasks are exactly the ones operators are now worse at because they never practice. Forty years later, this prediction is playing out simultaneously in medicine, aviation, software engineering, and education.
Endoscopists who used AI for colonoscopies showed a 6-percentage-point drop in adenoma detection rate when the AI was removed. Anthropic’s study showed a 17-percentage-point comprehension gap. The Wharton study showed 17% worse performance after AI was removed. METR found 30-50% of developers refused to work without AI access even at $150 an hour. Air France 447 crashed in 2009 because the pilots couldn’t fly manually when the autopilot disconnected. 228 people died.
A mixed-method review in Artificial Intelligence Review introduced the term “second singularity” — the point where repeated delegation to AI leads to irreversible loss of professional expertise. Not just individual practitioners losing skills, but entire organizations becoming brittle when critical capabilities are collectively forgotten. The review calls this “system embrittlement.”
The pipeline problem makes this worse. Junior engineering hiring is down 30%. CS enrollment dropped for the first time in 20 years. Today’s staff engineers got there by spending years writing bad code and debugging at 2 AM. If AI handles those training tasks, the apprenticeship that produces senior engineers breaks. We need more experienced people, but the path that creates them is closing.
Why the current design fails
Every piece of evidence above connects to the same root cause: the “AI generates, human reviews” paradigm violates how human cognition actually works.
The generation effect. Cognitive psychology has known for decades that actively generating information produces better memory and understanding than passively receiving it. When AI writes and you review, you don’t get the neural encoding that comes from creating. This explains why MIT found reduced brain engagement and why Anthropic found reduced comprehension.
Flow state disruption. Csikszentmihalyi’s flow state requires a balance between challenge and skill. AI collapses this: either the AI handles the challenge (too easy, boredom) or you review unfamiliar AI output (wrong kind of challenge, anxiety). The prompt-wait-review cycle also breaks the 10-15 minutes of uninterrupted focus that flow requires to begin.
Vigilance limitation. Humans can sustain focused monitoring for about 20 minutes before performance degrades. The entire safety model for AI in every domain assumes humans can maintain vigilance indefinitely. This assumption has been falsified in aviation, nuclear power, and medical monitoring. We’re building the same assumption into AI tools and expecting different results.
Self-determination collapse. When AI generates and you review, you lose autonomy (you’re not choosing what to create), competence (mastery requires struggle), and meaning (ownership comes from making). The Scientific Reports study measured this directly.
What actually works
The evidence also points to what doesn’t break. Several models preserve both productivity and human capability.
Hints, not answers. The Wharton study’s most important finding: a ChatGPT configured as a tutor that gave hints instead of direct answers eliminated cognitive debt entirely. Students learned just as much while still getting 127% better performance during practice. The difficulty was preserved, so the thinking transferred.
Human first, then AI. The Scientific Reports study showed that when humans draft first and AI helps refine, self-efficacy, ownership, and meaning are preserved. The opposite order damages them. The sequence determines the psychological outcome.
AI as second opinion, not first. A randomized trial of 70 clinicians tested workflows where both clinician and AI assess independently, then the AI generates a synthesis highlighting where they agree and disagree. Diagnostic accuracy improved from 75% to 82-85%. Alarm burden dropped 80%. The AI made the doctor’s thinking more visible, not less necessary.
Humans encode judgment, machines execute volume. Meta’s DrP platform runs 50,000 automated root cause analyses per day across 300 teams. MTTR dropped 20-80%. But it isn’t autonomous. Engineers codify their investigation logic into analyzers. The machine executes their thinking at scale. This has worked in production for five years.
Social accountability. A CHI 2026 study on “triadic programming” — two humans and one AI — found that when another person is present, developers reduce their dependence on AI-generated code. Social responsibility was more effective at preventing cognitive offloading than any technical design. The implication: pair programming with AI might be better than solo vibe coding not because of the code, but because of the human.
A different model
Based on all of this, I think AI tools need to operate in four modes, shifting dynamically based on context:
Generate. AI leads, human reviews. Use for routine, mechanical, low-stakes tasks where understanding doesn’t matter much. Boilerplate code, data formatting, scheduling. This is where current tools already work.
Scaffold. AI provides hints, structure, partial solutions. Human completes the work. Use when understanding matters — debugging, learning new systems, skill-building. Based on the Wharton tutor finding and the desirable difficulties literature. The generation effect is preserved because the human does the cognitive work.
Challenge. AI acts as critic, adversary, stress-tester. Use for high-stakes decisions, novel situations, creative work. “Here’s why your architecture might fail.” “This diagnosis has a 30% chance of being wrong because…” Based on the task-driven framework by Afroogh et al. that proposes autonomous, assistive, and adversarial roles for AI.
Step back. AI deliberately does nothing. Use when the human is in flow, building skills that need scar tissue, or doing creative work where voice matters. The first 20 minutes of a coding session. Critical debugging. Writing that needs a point of view. This is the hardest mode to implement because it produces no visible output, and every incentive in the industry pushes toward more AI, not less.
In practice, this might look like: a code editor that stays quiet for the first stretch of a session while you build a mental model. That generates boilerplate when you’re doing mechanical work. That shows you alternative approaches instead of complete solutions when you’re writing feature logic. That asks “have you considered this failure mode?” when you’re making architecture decisions. And that occasionally forces you to work without it, the way aviation mandates manual flying hours.
Ben Shneiderman’s Human-Centered AI framework argues that high automation and high human control aren’t opposites — you can have both simultaneously. I think that’s right, but it’s incomplete. You also need high human capability, and that requires the system to sometimes get out of the way.
Where this leaves me
I still use AI tools every day. I’m not arguing against them. I’m arguing that the way we’ve designed the interaction is optimized for the wrong thing. Current tools maximize task completion. They should maximize the performance of the human-AI system over time, which means preserving human judgment, engagement, and capability alongside output.
The METR finding — that developers are 19% slower with AI but believe they’re 20% faster — is the clearest symptom. We’ve built tools that feel productive while making us less capable. The BCG brain fry data shows it’s not just a perception problem; it’s a cognitive health problem. The Scientific Reports study shows it’s not just a cognitive problem; it’s a meaning problem. And the deskilling evidence from medicine, aviation, and software shows these effects compound over time.
The companies and tool builders that get this right will treat AI the way good educators treat scaffolding: support that’s calibrated to make the human stronger, not dependent. The ones that get it wrong will produce faster output and weaker people. I’ve spent two years watching both happen at the same time, and the research is starting to explain why.
Sources: All claims link to primary sources inline. Key studies: BCG “AI Brain Fry” (2026) · Wharton/PNAS cognitive debt · Anthropic skill formation (2026) · Scientific Reports passive vs active AI (2026) · UC Berkeley work intensification (2026) · “Automating the Joy Out of Work” (2026) · METR developer study (2025) · Bainbridge “Ironies of Automation” (1983) · DORA 2025 · Endoscopy deskilling (2025)