Long Yi

AI Safety

June 17, 2026

Universal values? A social-science and history map

When AI labs cite the UN Declaration of Human Rights, they inherit a century of debate about what 'universal' means — and what it doesn't. Canonical texts, irreconcilable conflicts, cultural variation, and the theories that explain the gaps.
June 16, 2026

What are we aligning to? A map of alignment paradigms

RLHF, Constitutional AI, CIRL, CEV, oracle-only Scientist AI, and Gabriel's fair-treatment-of-claims framework are not interchangeable fixes for the same problem. Each silently picks a different answer to what human values are — and most of the field never states which answer it chose.
June 12, 2026

Three layers of AI oversight: training, deployment, and inspection

Scalable oversight, AI control, and verification answer different questions at different stages of an AI system's life. Treating them as one thing, or assuming any layer alone is enough, is how safety proposals fall apart under scrutiny.
April 17, 2026

When perfection is impossible: a survey of structural limits in society and AI alignment

A long-form map of impossibility theorems and structural limits — from Arrow and Sen to Hart, Gödel, Goodhart, Ostrom, and Hayek — and how each one shows up in RLHF, constitutions, capability races, and safety evaluation.
April 2, 2026

Reading AI 2027: the best forecast, the worst blind spots

AI 2027 is the most specific, ambitious AI futures scenario I've read. It forces you to take superintelligence seriously. It barely discusses economic consequences, ignores human psychology, and uses narrative precision to hide enormous uncertainty.
April 1, 2026

Why everyone converges on 2027–2028 (and why that might not mean what you think)

Six independent methods now intersect on late-2020s AGI. How each one works, what they share, where they disagree, and what the AI 2027 tracker says about reality so far.

My personal essays on AI and people.

Philosophy

Posthuman ethics: therapy, the off button, and when you stop being you

AI for Science

A map of AI for scientific discovery

AI Safety

Universal values? A social-science and history map

What are we aligning to? A map of alignment paradigms

Three layers of AI oversight: training, deployment, and inspection

When perfection is impossible: a survey of structural limits in society and AI alignment

Reading AI 2027: the best forecast, the worst blind spots

Why everyone converges on 2027–2028 (and why that might not mean what you think)

governance

A political map of US AI policy

Where US AI policy is actually being written

A map of AI governance

How to participate in AI safety

Governance

Can a self-correcting Dataism escape the blueprint trap?

interpretability

A map of mechanistic interpretability: observe, intervene, validate

We jailbroke Qwen with a public technique, then tried to make tampering brick the model

utopia

The wealth gap is at a 35-year high. So why does everyone keep buying?

The happy path with AI

What Actually Makes Humans Happy

AI Consumer Apps

Why is switching AI platforms so hard -- Memory

Writing

AI and Writing

How AI agents actually affect work

How AI actually works in healthcare

AI Writes Half Our Code. We're Working Harder Than Ever.

How AI SRE agents actually perform

Human AI Interaction

Discussion on human-AI interaction models

Philosophy

后人类伦理：治疗、取下键、与何时不再是「你」

AI 与科学

AI 科学发现全景图

AI Safety

普世价值？一张社会科学与历史地图

我们在对齐什么？对齐范式地图

AI 监督的三层结构：训练、部署与检验

当完美不可能：社会与 AI alignment 的结构性极限综述

读AI 2027：最好的预测，最差的盲点

为什么所有人都收敛在2027–2028（以及这可能不代表你以为的意思）

治理

美国 AI 政策政治地图

美国 AI 政策实际在哪里写

AI 治理全景图

如何参与AI安全

Governance

自我纠错的 Dataism，能逃出蓝图陷阱吗？

interpretability

机制可解释性工具地图：观察、干预、验证

用公开手法 jailbreak Qwen 之后，我们试着让篡改直接废掉模型

乌托邦

贫富差距创 35 年新高，大家为什么还在买？

AI的理想路径

什么真正让人类幸福

AI 消费应用

为什么换AI平台这么难——记忆

写作

AI与写作

AI agent 如何真正影响工作

AI在医疗领域的真实表现

AI 写了一半的代码，我们却更忙了

AI SRE 智能体的真实表现

人机交互

人与AI的协作模型探讨