AI is 53,000x cheaper than a human. Until you count the real costs.

McKinsey says AI will generate $3-5 trillion in value. Goldman Sachs says 7% of global GDP. Acemoglu says 0.55% TFP growth over a decade. The spread between serious researchers is roughly 6x. None of these numbers are derived from the actual cost of AI performing actual tasks. They are top-down estimates: “AI could affect X% of tasks in Y% of jobs, multiplied by Z trillion dollars of wages.” The inputs to these models are judgment calls about AI capability, not measurements of AI economics.

This article tries to do the measurement. For each domain, the question is simple: how much does it cost a human to do a specific task, how much does it cost AI to do the same task, and — the part most analyses skip — how much does it cost to verify AI did it correctly and to fix it when it didn’t?

The answers are less dramatic than the headlines suggest, more nuanced than either optimists or pessimists claim, and surprisingly consistent across domains.

The model

The economics of AI replacing human work breaks down into one formula:

Real AI cost = generation cost + verification cost + error cost × error rate

Generation cost is what the AI charges you — API fees, compute, infrastructure. This is the number companies advertise. It is falling roughly 10x per year.

Verification cost is what humans charge you to check whether the AI got it right. This is the number nobody advertises. It is not falling, because it is bottlenecked by human cognitive bandwidth.

Error cost is what you pay when the AI got it wrong and nobody caught it. In code, this is a production incident. In medicine, this is a misdiagnosis. In law, this is a contract that doesn’t say what you thought it said. This cost ranges from trivial (reformatting a document) to catastrophic (patient harm, regulatory violation).

Most AI cost comparisons report only the generation cost. This is like comparing the price of a car by looking only at the sticker price, ignoring insurance, maintenance, fuel, and the cost of accidents.

Software development: the best-studied case

This is the domain with the most data, so let me walk through the full calculation.

What a human costs. A senior software engineer in the US earns a median base salary of $145,000. After benefits (health insurance, 401k, PTO), payroll taxes, equipment, office space, management overhead, and amortized recruiting and training costs, the fully loaded annual cost is $230,000-260,000 — roughly 1.6-1.8x the base salary.

But not all of those hours are productive output. Microsoft/DX research found engineers spend 11% of their workday actually writing code — about 52 minutes per day. The rest is meetings, email, waiting, context switching. Using a broader definition of productive work (design, debugging, code review), Worklytics benchmarks put “deep focus time” at 4.2 hours per day.

This gives us the real cost per unit of output:

At 52 minutes of coding per day × 250 working days = 217 coding hours per year → $1,198 per hour of actual coding on a $260K fully loaded cost.
At 4.2 hours of deep work per day × 250 days = 1,050 hours per year → $248 per hour of deep work.

If a senior engineer produces roughly 50 effective lines of code per hour (a common industry estimate, though it varies enormously), that’s $5-24 per line of code, depending on which definition of productive time you use.

What AI costs to generate. Claude Sonnet 4.6 charges $3 per million input tokens and $15 per million output tokens. A typical line of code is about 10 output tokens with ~100 tokens of context. Cost per line: about $0.00045. Using DeepSeek V3.2 at $0.28/$1.10 per million tokens: about $0.00004 per line.

The ratio: AI generates code somewhere between 11,000x and 615,000x cheaper than a human, depending on model and how you measure human productivity. This is the number that makes headlines.

What verification costs. Someone has to review AI code. GitClear’s 2025 analysis of 211 million lines of code found that teams with high AI adoption saw a 91% increase in code review time, despite generating more code. Code churn (code that is rewritten within two weeks of being written) increased from 3.1% to 5.7%.

CodeRabbit’s study of 470 pull requests found AI-generated PRs waited 4.6x longer in review queues, with a 32.7% acceptance rate versus 84.4% for human-written PRs. That means roughly two-thirds of AI-generated code gets rejected after consuming reviewer time.

Estimating conservatively: if reviewing an AI PR takes ~57 minutes (30 minutes baseline + 91% increase) at $248/hour senior reviewer time, that’s ~$236 per PR. Spread over ~200 lines per PR: $1.18 per line in verification costs.

What errors cost. CodeRabbit found AI code has 1.7x more defects than human code — 75% more logic errors, up to 2x the security vulnerabilities, and 8x more excessive I/O issues. If human production defect density is roughly 1 bug per thousand lines, AI is at 1.7 per thousand. The extra 0.7 bugs per thousand lines, at an average production bug cost of $5,000 (conservative, given enterprise downtime averages $5,600 per minute), means ~$3.50 per line in expected error costs.

The real comparison:

Component	Human (per line)	AI + human review (per line)
Generation	$5-24	$0.00045
Verification	(included — self-review)	$1.18
Error cost	$5 (at 1 bug/KLOC × $5K)	$8.50 (at 1.7 bugs/KLOC × $5K)
Total	$10-29	$9.68

AI is somewhere between roughly equivalent and 3x cheaper, once you account for the full cost. Not 53,000x. Not even 10x. Single digits.

This explains the METR finding that experienced developers on familiar codebases were 19% slower with AI tools. It explains why DORA 2025 found that AI adoption correlates with faster shipping but worse stability. The raw speed is real. The costs that eat the speed are also real. Most organizations measure the first and ignore the second.

There is a human interaction dimension here that this article won’t fully explore but is worth noting: the verification burden falls on the people who are most experienced and most expensive. AI generates; seniors review. This reshapes the nature of senior engineering work from creating to checking — a shift that matters for skill development, job satisfaction, and the long-term talent pipeline. I’ll come back to this in a separate piece on human-AI interaction design.

The same calculation, six more domains

Customer service — $5-15/ticket vs $0.50-2.37/ticket

Human: A US customer service agent costs $60,000-80,000/year fully loaded, handles 50-80 tickets/day, takes 4-10 weeks to train, and has a 30-45% annual turnover rate. Replacement cost: 40-70% of annual salary. Cost per ticket: $5-15. Response time: 2-15 minutes.

AI: Cost per resolved ticket: $0.50-2.37, converging toward ~$1.00 per conversation across platforms (Gorgias, Intercom, HubSpot). Response time: under 3 seconds. Resolution rate without human escalation: 40% average (Gartner), up to 66% for Intercom Fin.

Raw ratio: 3-30x cheaper.

But: Empathy scores drop to 60-75% for fully automated interactions versus 90-98% for humans. And here is the Klarna lesson.

Klarna case study — the cost of ignoring quality costs. Klarna deployed AI customer service in early 2024. Results looked spectacular: 2.3 million conversations handled in month one, 700 agents worth of work, response time from 11 minutes to under 2, $60 million in projected savings. Then CEO Sebastian Siemiatkowski admitted failure: “Cost was too predominant an evaluation factor.” Repeat contacts surged 25% — customers whose issues weren’t actually resolved the first time came back. Klarna posted a $152 million loss in H1 2025 versus $31 million in H1 2024. They are rehiring human agents.

The $60 million in direct savings was real. The costs it created — repeat contacts, customer churn, brand damage — were larger. This is our formula in action: the generation cost was 10x lower, but the error cost (unresolved issues leading to repeat contacts, lost customers) more than ate the savings.

Real advantage: ~2-5x for the easy tickets that AI resolves fully. Negative for complex or emotional issues. Gartner predicts $80 billion in global contact center labor cost reduction — but that projection was made in 2022 before the Klarna reversal.

Legal document review — $5-25/doc vs $0.11-0.50/doc

Human: Paralegals earn $26-28/hour, associates bill $300-600/hour. Manual due diligence review takes 6-8 weeks. Human recall rate on document relevance: 60-75%. Human reviewers disagree with each other ~30% of the time. Human contract drafting reliability (first draft): 56.7% (LegalBenchmarks.ai, September 2025). Top individual human lawyer: 70%.

AI: Cost per document: $0.11-0.50. Due diligence in under 4 hours versus 6-8 weeks. Recall rate: 90%+. Contract drafting reliability: 73.3% (Gemini 2.5 Pro) — beating the top human lawyer’s 70%.

Raw ratio: 10-50x cheaper, 250x faster, and — unusually — AI is actually more accurate than humans on recall and first-draft reliability. This is the one domain where AI quality exceeds human quality on measurable metrics.

But: 69.7% of AI legal outputs need editing or rework before use. Specialized legal AI tools flag 83% of high-risk outputs, but general-purpose AI only catches 55%. Human lawyers? They flagged 0% of high-risk outputs — raised no warnings at all. This is a case where AI and human weaknesses are complementary rather than overlapping.

Real advantage: ~3-10x for document review (high volume, pattern recognition). The error cost question is unresolved — no major AI legal malpractice case has been litigated yet. When it happens, the liability framework will reshape the economics.

Translation — $120-250/1K words vs $0.0005-0.007/1K words

Human: Professional translators charge $120-250 per thousand words and produce 2,000-2,500 words/day. Quality rating: 4.6/5.0. Training: 4-6 years.

AI (raw): LLM translation costs $0.0005-0.007 per thousand words depending on model. Quality: 93% of human overall, with GPT-4 at 4.3/5.0 and Claude 3.5 at 4.2/5.0. Speed: millions of words per day.

Raw ratio: 18,000x-500,000x cheaper. The largest gap of any domain.

But nobody uses raw AI translation for serious work. The industry standard is MTPE — Machine Translation Post-Editing, where a human editor revises the AI draft. MTPE costs $40-80 per thousand words and achieves roughly 98%+ of human quality. Post-editors process 3,000-5,000 words/day versus 2,000-2,500 for from-scratch translation. MTPE adoption has grown from 26% of language service providers in 2022 to 46% in 2024, with 62.6% using it for over 30% of projects.

Real advantage: ~2-6x (MTPE vs pure human). The 500,000x raw advantage compresses to single digits once you add the human verification layer. For legal, medical, and regulatory translation, raw AI is not usable — error consequences are too high. For marketing content and general communication, MTPE is becoming the default.

Content creation — $250-880/article vs $0.001-0.50/article

Human: A freelance writer charges $250-399 per 1,500-word article at $0.20-0.24/word. Including hidden costs (sourcing, screening, revisions, management), total cost per article reaches $300-880.

AI (raw): API cost per 2,000-word article: $0.001-0.50 depending on model (budget models at $0.001, GPT-5 Pro at $0.33, Claude Opus at $0.07).

Raw ratio: 500x-250,000x cheaper.

But quality-adjusted economics flip dramatically. A study of 1,000+ articles found human-written content generates 5.44x more organic traffic than AI content. In Meta/Facebook ad tests, human copy won on conversions 68% of the time (though click-through rates were nearly identical at 1.08% vs 1.07%). For high-stakes sales copy, human writers outperform by 40-200% on conversion.

The hybrid approach (~$12-18 per thousand words — AI draft + human edit) won on 7 of 8 quality metrics versus both pure AI and pure human.

Real advantage: ~5-15x for hybrid (quality close to human, cost fraction). But if you measure cost per conversion rather than cost per article, the advantage shrinks further because human content converts better. The metric you choose determines the answer.

Data analysis — $34-50/hour vs seconds of compute

Human: Data analysts earn $70,000-105,000/year, roughly $34-50/hour. They work sequentially, processing one analysis at a time.

AI: Processes queries in seconds versus hours or days. Mathematical/syntactic accuracy: 94.4%. Best NL-to-SQL tools achieve 85-95% accuracy on standard queries, dropping to 60-70% for complex or ambiguous ones. Can automate 30-40% of typical analyst tasks.

The split is unusually clean here. AI handles: SQL queries, data cleaning, standard visualizations, report summarization, anomaly detection, pattern recognition. Humans handle: strategic/creative analysis, ethical oversight, contextual nuance, stakeholder communication, causal reasoning.

Real advantage: For the 30-40% of tasks AI can automate, the advantage is 100x+ in time and essentially free versus hourly analyst cost. For the 60-70% requiring judgment, AI is a tool that makes the analyst faster but doesn’t replace them. The overall impact on the role: analysts spend less time querying and cleaning, more time interpreting and communicating. Whether this makes the job better or worse depends on which parts the analyst found meaningful — a question for the human-AI interaction discussion.

Accounting — $15-40/invoice vs $2-5/invoice

Human: Bookkeeping clerks earn $17.24/hour, full-charge bookkeepers $25.61/hour. Manual accounts payable invoice processing: $15-40 per invoice. Human data entry error rate: 1-4% per field, with 18% error rate specific to bookkeeping. 65% of audits find discrepancies. 44% of small businesses incur fines from bookkeeping errors.

AI: Automated invoice processing: $2-5 per invoice. OCR accuracy: 95-99%. Data entry error rate: 0.01-0.04% — roughly 100x better than humans on mechanical accuracy.

Raw ratio: 3-20x cheaper with far fewer mechanical errors.

But: The best AI accounting system (GPT-5.4) achieves only 77.3% accuracy across 101 real accounting tasks. The second-best (Gemini 3.1 Pro) hits 66%. Older models (GPT-4): 19.8%. AI fails 33%+ of complex real-world accounting tasks — month-end close, multi-entity consolidation, tax strategy.

Real advantage: ~2-5x for high-volume mechanical tasks (data entry, categorization, reconciliation). Unreliable for complex judgment tasks. The error cost dimension is high: accounting errors have regulatory and legal consequences. The 77.3% accuracy rate means roughly 1 in 4 tasks is wrong — in a field where errors compound (one wrong categorization throws off downstream reports, tax filings, audit results).

Radiology — $12-99/study vs ???

Human: Radiologists earn $370,000-550,000+/year, read ~50 studies/day, and train for 13+ years at a cost of $250,000-400,000+ in medical school debt alone. Teleradiology rates: $12 per X-ray, $40 per CT, $60 per MRI, $99 per PET/CT.

AI: 1,104 FDA-approved radiology AI devices as of early 2026, up from ~500 in 2023. Enterprise pricing: typically $25,000-100,000/year subscription. No reliable per-scan cost data exists — vendor pricing is opaque and bundled into enterprise contracts.

The quality picture is mixed and important. On routine screening (lung nodule detection, standard chest X-ray triage), AI is non-inferior or slightly superior to average radiologists. But the LungIMPACT trial — a real-world randomized study — found AI prioritization did not significantly improve diagnostic speed. On mammography, AI has lower sensitivity than radiologists, especially in dense breasts, but better specificity in non-dense cases. On the RadLE benchmark of difficult cases, the best AI model (GPT-5) scored 30%. Radiologists scored 83%.

The economics are genuinely unclear. Only 21 out of 1,879 studies (1.1%) on AI in radiology actually quantified economic outcomes. AI lung screening can save up to $242 per patient, but mammography CAD increases costs by up to $19 per patient. There is no malpractice case law for AI diagnostic errors. Physicians remain primarily liable, but not using AI may eventually create liability too.

Real advantage: Cannot be calculated from available data. This is the domain where the gap between hype (“AI will replace radiologists”) and evidence (1.1% of studies measure economic outcomes, AI scores 30% on hard cases) is widest.

What holds across all eight

Three patterns survive the domain-by-domain analysis.

The “easy 80% / hard 20%” split is universal. AI handles routine, well-defined tasks at 85-95% accuracy in every domain studied. On complex, judgment-heavy tasks, accuracy drops to 30-77%. This is not a matter of “current models will improve” — the split reflects the structural difference between pattern-matching (where AI excels) and contextual judgment under uncertainty (where it doesn’t). Whether future models close this gap is an empirical question, but the gap has been consistent across two years of improvement in model capabilities.

Hybrid outperforms both, everywhere. Human-AI collaboration produces better quality-adjusted results than either alone in every domain with comparison data. MTPE in translation. Hybrid content creation. AI-assisted legal review. AI-augmented radiology. The pattern is that AI handles volume and consistency while humans handle exceptions and judgment. This is the strongest empirical finding in the data and it has direct implications for how AI tools should be designed — augmentation architectures will outperform replacement architectures economically, not just humanistically.

Optimizing for generation cost alone backfires. Klarna is the canonical case, but the pattern appears everywhere: AI code that ships fast but breaks in production. AI translations that look correct but miss legal nuance. AI accounting that processes invoices but miscategorizes 1 in 4 complex transactions. The generation cost is always the most visible number. The verification and error costs are always the most consequential. Any organization evaluating AI should be measuring the total formula — generation + verification + (error rate × error cost) — not just the first term.

The Jevons number

One more piece of data that complicates the picture. AI inference costs have fallen roughly 1,000x between 2023 and 2026. In the same period, enterprise spending on generative AI rose 320%, from $11.5 billion in 2024 to $37 billion in 2025. Average monthly AI budgets increased 36% year over year. The share of organizations spending over $100,000 per month doubled to 45%.

This means total token consumption increased approximately 3,200x in two years. The per-unit cost dropped 1,000x. Total spending tripled. William Stanley Jevons observed this in 1865 with coal: efficiency gains don’t reduce consumption, they increase it, because cheaper inputs unlock uses that weren’t economical before.

For AI, this means the cost savings at the task level — the 1-5x advantage calculated above — may not translate into reduced overall spending on the function. When code becomes cheaper to write, organizations write more code. When customer service becomes cheaper, companies deploy it to interactions that previously went unserviced. When translation becomes nearly free, content gets translated into languages that weren’t worth the cost before.

Whether this is good (more software, more service, more access) or bad (more complexity, more maintenance burden, more quality problems) depends on what you’re measuring. GitHub reports PRs merged up 23% year over year, commits up 25%, new iOS apps up 50%. DORA reports mean time to recovery getting worse every year since 2021. Both are true simultaneously.

What this means

The first-principles calculation suggests AI’s real economic advantage, after accounting for verification and error costs, is in the range of 1-5x for most knowledge work — far below the 100x-100,000x raw generation advantage, far above zero, and highly sensitive to how the human-AI interaction is designed.

The domains where AI has the clearest advantage are those with low verification costs and low error consequences: formatting, translation of non-critical content, first-draft generation, routine data processing. The domains where the advantage shrinks or disappears are those with high verification costs and high error consequences: medical diagnosis, legal judgment, architectural decisions in software, strategic analysis.

Two implications follow.

For the “AI will replace X million jobs” predictions: The task-level economics suggest wholesale replacement is rational only for roles where most tasks fall into the “easy 80%” category AND error consequences are low. For roles that require judgment on the “hard 20%,” AI is a productivity tool, not a replacement — and the productivity gain is modest (1-5x), not transformative (100x+). The BLS projection of 15-18% growth in developer employment through 2034, despite AI, is consistent with this math. Jevons suggests more work, not less.

For how we design AI tools: The verification cost is the binding constraint. It is what compresses AI’s 53,000x raw advantage into a 1-5x real advantage. Any improvement in AI that reduces the need for human verification — better reliability, better uncertainty calibration (knowing when it’s wrong), better explanations of its reasoning — has more economic impact than improvements in raw speed or capability. This is a design insight, not just an economic one, and it has direct implications for how AI tools should be built and evaluated. But that’s a conversation about human-AI interaction that deserves its own treatment.

Sources

Data for this analysis is compiled in:

Software engineering cost data — 730 lines, 20+ specific data points with sources
Multi-domain cost data — 510 lines, 8 domains with head-to-head comparisons
First-principles model — full framework with formulas and cross-domain summary