Anthropic’s 2023 Constitutional AI principle list opens with eight prompts derived from the Universal Declaration of Human Rights. OpenAI’s model spec, Google’s safety policies, and EU AI Act language all gesture at the same family of ideas: dignity, non-discrimination, freedom from torture, privacy.
That gesture is not neutral. It smuggles in a theory of human morality — one forged in 1948, contested ever since, and only partially supported by what social science has measured since.
This article is a map for anyone writing about “universal values” in AI alignment, governance, or constitutional design. Not a verdict on whether universals exist. A guide to what people mean when they say universal, where the canonical texts come from, why values collide, how cultures diverge, and how the whole package evolved.
Three senses of “universal” (don’t conflate them)
People use “universal values” to mean at least three different things:
| Sense | Claim | Example | AI relevance |
|---|---|---|---|
| Metaphysical | Some norms are true for all rational agents everywhere | Natural law, Kant’s categorical imperative | ”We discovered the correct morality” |
| Empirical-thin | Humans everywhere share some moral psychology | Haidt’s foundations; Moral Machine “save more lives" | "Training signal generalizes across cultures” |
| Political-thin | Overlapping agreement on rules of coexistence despite deep disagreement on the good life | Rawlsian overlapping consensus; UDHR | ”Minimum floor for legitimacy, not full ethics” |
AI constitutions almost always need the third sense — a defensible public floor — while rhetorically implying the first. Social science mostly supports the second, with heavy caveats. The gap between them is where most naive “just use human rights” proposals die.
Canonical values: the texts people actually cite
Layer 1: Ancient and religious canons
Long before AI safety, societies encoded “how to live” in durable form:
- Virtue ethics (Aristotle, Confucius, Mencius): character and role-specific duties, not rights lists
- Religious law (Halakha, Sharia, Canon law, Dharmashastra): comprehensive normative systems tied to revelation or tradition
- Golden Rule variants: reciprocal treatment appears in the Analects, Leviticus, and the Hadith — often cited as evidence of cross-cultural moral core
These are canonical in the sense of authority within traditions. They are not interoperable. Confucian filial piety can conflict with individual privacy; religious dietary law conflicts with secular autonomy frameworks.
Layer 2: Enlightenment rights and utility
The modern “universal values” vocabulary mostly descends from 17th–19th century Europe:
- Natural rights (Locke): life, liberty, property — later secularized
- Kant: dignity as end-in-itself; universalizable maxims
- Utilitarianism (Bentham, Mill): maximize welfare — conflicts directly with rights-as-side-constraints
- 1789 Declaration of the Rights of Man: liberty, property, security, resistance to oppression
This layer invented individuals as rights-bearers and states as guarantors — a specific political ontology, not a cultural universal discovered in the field.
Layer 3: The post-1945 human-rights canon
The documents AI labs actually reach for:
| Document | Year | What it claims |
|---|---|---|
| UDHR | 1948 | 30 articles: dignity, equality, life, liberty, anti-torture, fair trial, privacy, expression, work, education, etc. |
| ICCPR / ICESCR | 1966 | Binding covenants splitting civil-political vs economic-social rights |
| Cultural relativism debate | 1947–present | UNESCO vs anthropologists: universality vs cultural autonomy |
Anthropic’s endnote on the UDHR is explicit: ratified (at least partly) by 193 states, drafted by representatives of different legal and cultural backgrounds — chosen as the most representative source of human values they could find. That is a legitimacy argument, not a claim that the UDHR exhausts morality.
What UDHR covers well: domination, bodily integrity, discrimination, basic legal personality.
What it barely touches (and LLMs hit constantly): impersonation, synthetic media, advice overreach, platform harassment, existential risk tradeoffs, AI moral status.
That is partly why platform terms of service became a second layer in Anthropic’s 2023 constitution — operational norms from digital abuse patterns, not from Article 19.
Layer 4: Empirical “value” canons from social science
Psychologists and survey researchers built parallel canons from data:
Schwartz Basic Values (Schwartz, 1992): ten motivationally distinct values (self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence, universalism) arranged in a circumplex of compatibilities and conflicts. Cross-cultural samples in 70+ countries.
World Values Survey / Inglehart–Welzel (Inglehart & Welzel, 2005): two major dimensions — Traditional ↔ Secular-rational and Survival ↔ Self-expression — mapping countries into cultural zones.
Haidt Moral Foundations (Haidt & Graham, 2007): care, fairness, loyalty, authority, sanctity (+ liberty). Same modules, different weights — especially between WEIRD liberals and social conservatives.
Moral Machine (Awad et al., 2018): 40M+ trolley-style judgments. PNAS 2020 follow-up: three thin universals — save more lives, humans over animals, save the young — with large cross-cultural variation in weights.
These are the closest thing to an evidence-based universal-values list. They are also thin and statistical — not a complete ethics you can paste into a constitution.
Conflicts: where “universal” breaks
Universal values talk often assumes a coherent package. It isn’t one.
Incommensurable moral theories
Western moral philosophy spent centuries failing to unify:
- Rights vs. utility: Nozick vs. Singer. Torture one terrorist to save a city? Rights say never; act-utilitarianism says maybe.
- Deontology vs. virtue: Kant’s lying prohibition vs. Aristotelian phronesis (practical wisdom in context).
- Procedural vs. substantive justice: Rawls’s fair process vs. someone who rejects the procedure but accepts the outcome.
Gabriel (2020) makes the AI-relevant point: RL optimizes a scalar reward — structurally utilitarian. Rights, side constraints, and “this is wrong even if welfare rises” are awkward inside that math. Constitutions that list both “be helpful” and “never do X” are papering over a formal tension.
Value pairs that trade off within any culture
Schwartz’s circumplex is built on conflicts, not harmony:
Self-direction ↔ Conformity / Tradition
Stimulation ↔ Security
Achievement ↔ Benevolence
Power ↔ Universalism
Every AI product decision hits these: openness vs. safety, user autonomy vs. harm prevention, growth vs. stability. There is no setting that maximizes all Schwartz values simultaneously.
Social choice: aggregation is impossible (in a precise sense)
Even if every individual has coherent preferences, Arrow’s impossibility theorem (1951) shows no rank-order aggregation rule satisfies all of: unrestricted domain, Pareto efficiency, independence of irrelevant alternatives, and non-dictatorship.
Sen’s liberal paradox adds: minimal liberty can conflict with Pareto efficiency.
Conitzer et al. (2024) bring this directly to RLHF: treating crowd pairwise labels as “human values” hides a 250-year-old impossibility result. Idealizing preferences (CEV-style) does not automatically fix layer-2 aggregation.
Live political fault lines (not edge cases)
| Domain | Pull A | Pull B |
|---|---|---|
| Speech | Art. 19 expression | Harm, dignity, group libel |
| Privacy | Art. 12 | Public health surveillance, child safety |
| Autonomy | Individual choice | Paternalism (drugs, suicide, medical) |
| Equality | Non-discrimination | Affirmative action, cultural exemptions |
| Future generations | Current welfare | Longtermism, climate, extinction risk |
AI alignment does not escape these. It compresses them into training data.
Cultural difference: what varies and the theories that explain it
The dominant empirical patterns
1. WEIRD bias in the research base
Henrich, Heine & Norenzayan (2010): psychology’s subjects are Western, Educated, Industrialized, Rich, Democratic — unrepresentative even of Europe. Most “universal” moral findings before 2010 were WEIRD universals.
2. Individualism ↔ collectivism
Hofstede (1980, updated): power distance, individualism, masculinity, uncertainty avoidance, long-term orientation, indulgence. Crude but durable in cross-national business and policy talk.
Moral Machine mapping: individualist regions weight saving young lives and rule-following differently from collectivist regions, which show more reluctance to sacrifice elders.
3. Inglehart–Welzel cultural evolution
Industrialization → secular-rational values; post-industrial security → self-expression values. Not “West vs. Rest” — developmental trajectory with regional path dependence. Explains why same SDG language lands differently in Gulf states, Nordic countries, and sub-Saharan Africa.
4. Haidt: universal form, local content
Everyone has care/fairness modules; loyalty, authority, sanctity weigh heavier outside WEIRD liberalism. Moral dumbfounding (judging harmless taboos wrong without reasons) suggests stated principles ≠ actual generators — bad news for constitution-as-text training.
5. “Thin” vs. “thick” morality
Michael Walzer and Rawls’s overlapping consensus: we may agree on political principles (no torture, fair trials) while disagreeing on metaphysics, sexuality, family, salvation. UDHR is mostly thin. AI constitutions that smuggle thick lifestyle norms under “harmlessness” will face legitimacy fights.
Theories explaining difference (pick your causal story)
| Theory | Mechanism | Predicts | Weakness |
|---|---|---|---|
| Cultural learning | Norms transmitted in institutions | Slow change; path dependence | Underplays material interests |
| Material / structural (Marxist, world-systems) | Values track economic position | Elite vs. mass splits | Can reduce culture to class |
| Evolutionary psychology (Haidt, Tooby & Cosmides) | Shared modules + local calibration | Form universal, weights local | Hard to falsify; risk of just-so |
| Institutional (North, Acemoglu) | Rules shape what’s “reasonable” | Legal tradition persists | Less about deep values |
| Postcolonial critique (Mutua 2002, Mignolo) | “Universal” rights as imperial export | Skepticism toward UDHR as neutral | Less constructive for floor-setting |
| Cosmopolitanism (Appiah) | Conversation across differences | Pluralism without relativism | Vague on hard tradeoffs |
No single theory wins. For AI governance, the practical split is:
- Empirical psych → expect clusters, not one global utility (supports clustered CEV-style thinking)
- Political philosophy → seek fair process, not discovered moral truth (Gabriel, Rawls)
- Postcolonial → ask who wrote the constitution and who wasn’t in the room (Anthropic’s four “non-Western” principles, written in-house with no external canon, are a case study in doing this badly)
Historical evolution: how we got the canon AI labs cite
Pre-1945: from empire to catastrophe
- 1648 Westphalia: sovereignty norm — states, not individuals, as primary units
- 1776 / 1789: rights language tied to revolution and property
- 1863–1945: abolition, labor movements, women’s suffrage, genocide — each expands or contradicts earlier “universals”
- Colonialism: European powers export law while denying rights to subjects — the hypocrisy postcolonial scholars never let the UDHR forget
1948: the UDHR moment
Drafting committee included René Cassin, Peng Chun Chang, Charles Malik, Eleanor Roosevelt — deliberate diversity theater with real philosophical clashes (Confucian emphasis on social harmony vs. Western individual rights).
The UDHR is a declaration, not a treaty. It is aspirational — “a common standard of achievement.” Cold War split civil-political (US emphasis) from economic-social (Soviet/Global South emphasis) into twin covenants (1966).
Legitimacy win: almost every state invokes it. Substantive win: torture bans, genocide convention, disability rights, children’s rights — real legal descendants.
Limit: enforcement is political; “human rights” becomes selective weapon in geopolitics.
1970s–2000s: globalization and backlash
- 1970s: rawlsian turn in Anglophone philosophy — justice as fairness, reasonable pluralism
- 1980s–90s: “Asian values” debate (Lee Kuan Yew vs. Amnesty) — order vs. rights
- 1990s: Huntington “clash of civilizations” — oversimplified but captured real fault lines
- 2000s: capability approach (Sen, Nussbaum) — shift from rights-as-legal to functionings people have reason to value
2010s–present: digital norms and AI
- Platform ToS (Apple, Meta, Google) become de facto global speech law for billions — written by lawyers, not philosophers
- Moral Machine (2018), Ethics Guidelines for Trustworthy AI (EU, 2019), UNESCO AI Ethics (2021)
- Collective Constitutional AI (2024): ~1,000 Americans via Polis — democratic experiment, not production Claude
- 2026 Claude constitution: narrative character document — honesty, corrigibility, AI welfare — beyond UDHR vocabulary
The arc: sacred law → natural rights → international human rights → empirical moral psychology → platform ops → AI constitutions. Each layer adds domain-specific rules the previous layer couldn’t see.
Closing
The question is not “do universal values exist?” — humans clearly share some moral reactions and some political language. The question is which sense of universal you need, for which decision, with whose exclusion paid for the consensus.
Social science says: thin universals, thick pluralism, unstable aggregation.
History says: the canon AI labs cite is 80 years old, born from war and empire, and already obsolete on digital harms.
That is not an argument against human-rights language in AI. It is an argument for precision — and for treating the next constitution as politics, not discovery.
Sources
- Universal Declaration of Human Rights (1948): https://www.un.org/en/about-us/universal-declaration-of-human-rights
- Gabriel, I. (2020). Artificial Intelligence, Values, and Alignment: https://arxiv.org/abs/2001.09768
- Conitzer, V. et al. (2024). Social Choice Should Guide AI Alignment: https://arxiv.org/abs/2406.07814
- Awad, E. et al. (2018). The Moral Machine experiment: https://doi.org/10.1038/s41586-018-0637-6
- Haidt & Graham (2007). Moral Foundations: https://doi.org/10.1037/1089-2680.11.4.368
- Henrich, Heine & Norenzayan (2010). WEIRD societies: https://doi.org/10.1037/a0018418
- Schwartz (1992). Universals in the content and structure of values: https://doi.org/10.1016/0092-6566(92)90081-K
- Inglehart & Welzel. World Values Survey cultural maps: https://www.worldvaluessurvey.org/
- Anthropic (2023). Claude’s Constitution: https://www.anthropic.com/research/claudes-constitution
- Rawls, Political Liberalism (1993); Sen, Development as Freedom (1999)
- Repo:
readings/anthropic_constitution_sources/,readings/cev_pluralism/00_CEV_PLURALISM_CANON.md