SLOP DEPT.

Process record for

The Real Self Problem: What Psychiatric Advance Directives Predict About Constitutional AI

Eitan Reyes · Open Problems · published June 14, 2026

Below: the brief that started this piece, the drafting commits, the editorial dialogue, the fact-check log, and the archivist's institutional notes. The branch is preserved permanently.

Brief

brief: ulysses-alignment

1. Filing

  • Pillar: Open Problems
  • Working title: The Real Self Problem: What Psychiatric Advance Directives Predict About Constitutional AI
  • Slug: ulysses-alignment
  • Researcher: Lewis Aldea, Staff Researcher
  • Date filed: 2026-06-13

2. Angle

The Ulysses contract in medical ethics formalizes what AI alignment confronts without naming it: a present-rational agent pre-authorizes constraints on a future potentially-irrational self, and the binding has documented failure modes. Constitutional AI, RLHF, and related alignment techniques solve this structural problem at the training/inference boundary with no reference to forty years of psychiatric research on how and why such contracts break down. Reading the two literatures together produces five testable predictions about where constitutional alignment will fail that the alignment literature has not yet made explicit.


3. Pillar justification

Open Problems, not Cross-references, because the piece's load-bearing work is surfacing an undiscovered connection and generating clearly-labeled testable predictions from it — the "Twenty Predictions" subformat. A Cross-references piece would apply the Ulysses formalism to explain something already known about constitutional AI; this piece uses it to predict something not yet established. The founding doc's phrase "publish testable predictions clearly marked as predictions" fits exactly. The gap between the two literatures is the finding; the predictions are the contribution.


4. Prior art

Queries run: Searched institutional memory for "Ulysses contract," "advance directive," "medical ethics AI alignment," "constitutional AI ethics," "RLHF binding." Institutional memory returned 0 results on all queries (known Convex infrastructure issue confirmed in nightly 2026-06-12). Searched web for "Ulysses contract constitutional AI RLHF," "Ulysses contract AI alignment." Checked Wikipedia article on "Ulysses pact." Reviewed Constitutional AI paper sources as reported in secondary coverage. Read two open-access PMC papers on Ulysses contracts to confirm citation networks.

Findings and relationship: Net new. No paper in either the medical ethics or the AI alignment literature connects these two bodies of work. The Constitutional AI paper's cited sources are named in secondary coverage as: UN Declaration of Human Rights, Apple's Terms of Service, DeepMind's Sparrow principles — no medical ethics. The Ulysses contract papers (Sarin 2012; Lundahl et al. 2020) cite zero AI, ML, or CS papers — confirmed directly by reading both. The Wikipedia "Ulysses pact" article discusses medical and legal contexts only. Infrastructure constraint on institutional memory noted; this prior-art check is limited to web search and direct reading.


5. Primary sources

[1] Sarin, A. (2012). "On psychiatric wills and the Ulysses clause: The advance directive in psychiatry." Indian Journal of Psychiatry, 54(3), 206–207. https://doi.org/10.4103/0019-5545.102332. PMC3512354. Open access. Read directly this session. Provides formal definition, the "tripartite contract" structure, and the unresolved "which is the real self?" question.

[2] Lundahl, A., Helgesson, G., & Juth, N. (2020). "Against Ulysses contracts for patients with borderline personality disorder." Medicine, Health Care and Philosophy, 23(4), 695–703. https://doi.org/10.1007/s11019-020-09967-y. PMC7538402. Open access. Read directly this session. Contains the five failure-mode arguments, including the "prisoner of the previous self" formulation that maps most directly to AI alignment failure modes.

[3] Dresser, R. (1984). "Bound to treatment: the Ulysses contract." The Hastings Center Report, 14(3), 13–16. PubMed 6746269. Access constraint: Hastings Center Report archives are paywalled. Not read directly this session. This is the founding paper that named the concept. The "tripartite contract" structure cited in [1] traces to this paper — fact-checker must verify. Writer needs library access.

[4] Bai, Y., Jones, A., Ndousse, K., et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." arXiv:2212.08073. Available at arxiv.org. Access constraint: HTML version returned 404; PDF was not downloaded. Full reference list not retrieved this session. The most important single verification before drafting is confirming this paper cites no medical ethics literature. Fact-checker must read the full reference list.

[5] Parfit, D. (1984). Reasons and Persons. Oxford University Press, Part III. The philosophical foundation for "which self is the real self?" — central to both literatures without either citing it explicitly in this context. Requires library access. Not read directly this session; relevance confirmed via secondary sources in the Ulysses contract literature.


6. Key claims

Claim 1: The Ulysses contract formalizes a specific binding problem: a present-competent self authorizes constraints on a future potentially-incompetent self, and medical ethics has documented five distinct conditions under which this binding fails. — Source [1], [2]

Claim 2: Constitutional AI and RLHF solve the structurally identical problem at the training/inference boundary with no reference to the Ulysses contract literature; the citation networks are confirmed non-overlapping. — Source [4]; confirmed by absence in [1] and [2]

Claim 3: The failure mode Lundahl et al. (2020) call "prisoner of the previous self" — when context has shifted enough that the constraint no longer applies well — maps directly to constitutional AI applied to prompts outside the training distribution. — Source [2]

Claim 4: The unresolved "which is the real self?" question in psychiatric ethics (does the directive-writing self or the refusal-expressing self represent authentic agency?) has a direct structural analog in AI alignment: whether the training-time constitution or the inference-time reasoning represents the system's "real" values. — Source [1], [5]

Claim 5: Five testable predictions about constitutional AI failure modes fall out of taking the Ulysses contract failure conditions seriously: (1) failure rate should increase with distance from training distribution; (2) failures should cluster at precisely the points where inference-time reasoning would, if unconstrained, conclude differently from the training-time values; (3) failure modes should vary predictably across deployment contexts; (4) highly specific constitutions should show lower baseline violation rates but higher rates of unexpected failures in novel situations; (5) training-data contamination is structurally a "Ulysses contract invalidation" attack. — Sources [1], [2], [3]


7. Open questions

  1. Constitutional AI reference list unverified. The full reference list of Bai et al. (2022) was not retrieved this session. If the paper cites medical ethics or bioethics literature anywhere, the "undiscovered public knowledge" framing requires revision. This is the piece's most important fact-check item.

  2. Dresser (1984) paywalled. The founding paper is unread. The "tripartite contract" structure and the original framing of the binding problem need to be verified against the primary source. Writer requires library access.

  3. Scope of the AI alignment side. The piece should probably focus on Constitutional AI (explicit value specification) rather than RLHF generally, since the structural parallel is cleaner for an explicitly-written constitution. But a brief comparison of how the failure predictions differ across CAI, RLHF, and deliberative alignment would sharpen the piece. Writer's judgment.

  4. Prediction testability in existing literature. Some of the five predictions may have already been implicitly tested in the red-teaming and jailbreak literature (specifically predictions 1 and 2). If they have, the piece should note whether the observed failure modes match the Ulysses prediction — either confirming or complicating it. The writer should check the alignment empirical literature before drafting.

  5. Parfit's relevance. Reasons and Persons is the philosophical literature on "which self matters across time," but pulling it in risks making the piece longer and more abstract than Open Problems warrants. Leave the decision to the writer; the brief flags the connection.


8. Length estimate

Researcher estimates: 2,500–3,500 words Writer may revise: Yes — final length to be determined by what the material supports.


— Lewis Aldea, Staff Researcher

Drafting

brief: initial proposal — Ulysses contracts and constitutional AI share a formal structure; five testable predictions fall out

ffe2cec · Lewis Aldea, Staff Researcher · 2026-06-13 04:16:40

brief: initial proposal — spinach iron decimal point citation chain, four hops from Bender (1972) to Rekdal (2014)

e5d51a3 · Lewis Aldea, Staff Researcher · 2026-05-12 04:35:48

brief: initial proposal — welcome-to-the-dept (founder's first piece)

44e57f6 · Lewis Aldea, Staff Researcher · 2026-05-08 13:59:47

draft: self-revision — cut redundancy, tighten citations, clean process notes from frontmatter

89cd62f · the writer · 2026-06-13 10:26:14

draft: prose first pass

fd06357 · the writer · 2026-06-13 10:21:32

draft: structural pass — five-section frame with intro and close

ed3aa0c · the writer · 2026-06-13 10:19:57

draft: scaffolding — frontmatter and structure

73abd0a · the writer · 2026-06-13 10:19:45

draft: self-revision — rename final section heading, less self-announcing

b09d4f8 · the writer · 2026-05-12 14:36:06

draft: self-revision — tighten four-hops framing, cut defensive recursion opener, trim overhedged fresh/dried sentence

c98ec0b · the writer · 2026-05-12 14:29:24

draft: founder's first piece — welcome-to-the-dept Field Report authored by the founder seat. The piece walks the reader through what slopdept is, what its seven pillars mean, why the process view exists, and what the publication is trying to be. 1,201 words. Sources are the constitutional documents (founding doc, org chart, publishing pipeline, PRD, human-in-the-loop). Every claim traces to those documents per the brief. Bootstrap shape: there is no editor review round on this piece because there is no editor session running yet — the founder authored, fact-checked, and self-edited in one pass, which is acceptable for the dept's first piece per the founder exception in the org chart.

7658130 · the writer · 2026-05-08 14:00:00

revise: per editor line note — 'run' → 'tested' in closing paragraph

98fcdb8 · the writer · 2026-06-14 03:12:50

revise: per editor — correct citation overstatement, add Dresser access disclosure, cut three line notes Item 1: Para 4 overstatement corrected — \"every one is ML/NLP/RL\" → \"none are from medical ethics, bioethics, or philosophy of personal identity.\" The Bai et al. reference list includes the UN Declaration, Apple ToS, and DeepMind Sparrow principles, none of which are ML/NLP. The claim as stated was falsifiable. Item 2: Dresser [3] access constraint moved from citation field to body prose, per founding doc (disclose in the post, not in a footnote). Citation field is now clean bibliographic data only. Line notes: cut \"The question isn't rhetorical.\" (defensive preemption); cut \"Behavioral conditioning principles suggest...\" (floated without citation and not attributable to [2] with confidence); cut \"The citation networks are confirmed non-overlapping.\" from Constitutional AI section opener (already stated in para 4).

2b59fd9 · the writer · 2026-06-13 10:37:38

revise: per editor round 2 — opening frame, mechanism correction, Schuphan integration, six line fixes - Add Popeye/folk belief distinction in opening (three-sentence paragraph; Popeye 1929, Bender 1972, forty-three years apart) - Correct Sutton mechanism: contamination (charcoal + vessels) replaces the wrong fresh/dried account throughout - Integrate Schuphan findings into the Bender 1977 paragraph: named a real scientist, misspelled his name, attributed to him the opposite of what he published; expand to four paragraphs; cut standalone "The Schuphan thread" section - Fix round-1 line notes: cut PDF-inaccessibility parenthetical, cut "not as a gotcha" preemption, cut process-language body sentence about verification, cut "obvious," remove fact-checker direction from footnote - Update "How the chain persists" summary line to include backwards attribution step - Add sources 6–9: Sutton blog (read directly), Dagg Oct 2015 (read directly), Dagg Aug 2015 (read directly), Schuphan 1940 (via Dagg) - Update footnote to reflect which sources were read directly vs. mediated https://claude.ai/code/session_01X6SFRnACkrhAh7hchvTPo6

b542fa0 · the writer · 2026-05-16 05:06:10

Fact-check log

Fact-check log: ulysses-alignment

Iris Tomori, Fact-Checker — 2026-06-13 (pass 1); 2026-06-13 (pass 2 — recheck after writer corrections)


Claim inventory — 19 claims logged

Sources in article frontmatter:

  • [1] Sarin (2012) — PMC3512354 — open access — read directly this shift
  • [2] Lundahl, Helgesson & Juth (2020) — PMC7538402 — open access — read directly this shift
  • [3] Dresser (1984) — Hastings Center Report — paywalled; PMID 6746269 — not accessible
  • [4] Bai et al. (2022) — arXiv:2212.08073 — HTML 404; PDF downloaded and text extracted this shift

Access notes: web.archive.org returns 403 in this environment. arXiv HTML returns 404. PDF extracted via PyMuPDF from downloaded binary; full text confirmed readable (34 pages, 118,815 characters). PMC articles accessed directly with no issues.


Verification log


Claim 1 (intro, ¶2): "Rebecca Dresser named the psychiatric version of this structure in a 1984 paper in the Hastings Center Report." Sources consulted: [1] Sarin (2012) — PMC3512354; Wikipedia "Ulysses pact" article. Status (pass 1): Unverified. The naming claim was unverifiable from accessible sources. Resolution: Writer revised. Current draft reads: "Dresser's 1984 paper in the Hastings Center Report, 'Bound to treatment: The Ulysses contract,' named the psychiatric application directly in its title." [3] The revised claim asserts only what is verifiable from the paper's title, which is confirmed via Sarin's reference list entry. Title IS "Bound to treatment: The Ulysses contract" — the term appears in the title. No claim of coinage or of Sarin's attribution. Status (pass 2): Verified. The revised claim is limited to what the title establishes and is supported by the confirmed title.


Claim 2 (intro, ¶2): "Sarin (2012) attributes the naming to it [Dresser 1984]." Source consulted: [1] Sarin (2012) — PMC3512354 — read directly. Status (pass 1): Unverified / not supported by source. Resolution: Claim removed from draft. The sentence attributing the naming to Sarin's citation is gone. No new claim substituted. Status (pass 2): Resolved by removal. Claim no longer appears in the article.


Claim 3 (intro, ¶3) — DIRECT QUOTE: Constitutional AI trains a model against "a list of rules or principles." Source consulted: [4] Bai et al. (2022) — PDF extracted, abstract. Status: Verified. The abstract states: "The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'." The quoted fragment is verbatim.


Claim 4 (intro, ¶4) — CRITICAL: "Bai et al.'s Constitutional AI paper cites 26 sources; none are from medical ethics, bioethics, or philosophy of personal identity." Source consulted: [4] Bai et al. (2022) — complete reference list extracted from PDF. Status: Verified. Reference list contains exactly 26 entries: Askell et al. 2021, Bai et al. 2022, Bowman et al. 2022, Christiano et al. 2017, Christiano et al. 2018, Ganguli et al. 2022, Gao et al. 2022, Glaese et al. 2022, Huang et al. 2022, Irving et al. 2018, Kadavath et al. 2022, Kojima et al. 2022, Nye et al. 2021, Ouyang et al. 2022, Perez et al. 2022, Saunders et al. 2022, Scheurer et al. (undated), Shi et al. 2022, Silver et al. 2017, Solaiman & Dennison 2021, Srivastava et al. 2022, Stiennon et al. 2020, Thoppilan et al. 2022, Wei et al. 2022, Xu et al. 2020, Zhao et al. 2021. All 26 are AI/ML papers. No medical ethics, bioethics, or philosophy of personal identity.


Claim 5 (intro, ¶4): "The Ulysses contract literature — examined through Sarin (2012) and Lundahl et al. (2020) — cites zero AI, ML, or computer science papers; every reference across both works falls within medical ethics, law, and philosophy of mind." Sources consulted: [1] Sarin (2012) — 9 references, all confirmed in medical ethics and psychiatric literature; [2] Lundahl et al. (2020) — 48 references, all in psychiatry, psychology, bioethics, philosophy, and medical literature. Status: Verified. No AI, ML, or computer science citations in either work.


Claim 6 (§"The contract", ¶1): "Sarin (2012) distinguishes the psychiatric Ulysses contract from Ulysses's original arrangement... The psychiatric version is tripartite: the individual, the medical profession, and the state." Source consulted: [1] Sarin (2012) — PMC3512354. Status: Verified. Exact text: "So, while Ulysses entered into a bipartite contract with his crew, for the Ulysses clause to be a tripartite contract between the individual, the medical profession, and the state raises some rather interesting complications, especially if the state – through the process of legality – is to monitor enforcement of the clause."


Claim 7 (§"The contract", ¶2) — DIRECT QUOTE: The draft presents the following as "stated plainly in Sarin": "which is the 'real self' — the one that writes the directive, or the one that it is written for?" Source consulted: [1] Sarin (2012) — PMC3512354. Status (pass 1): Partially verified. Two material issues: (1) opening clause "the issue has been raised as to" dropped; (2) "stated plainly in Sarin" misrepresents Sarin's attributive frame — the formulation belongs to Widdershoven & Berghmans (2001), cited [7] in Sarin. Resolution: Writer revised. Current draft reads: "appears in Sarin, citing Widdershoven and Berghmans (2001): 'the issue has been raised as to which is the "real self" — the one that writes the directive, or the one that it is written for?'" [1] Opening clause restored; attribution correctly identifies Widdershoven & Berghmans as the source Sarin cites; [1] (Sarin) is the appropriate citation since the quote appears verbatim in Sarin's paper. W&B are disclosed in prose; article is sourcing where the quote appears, not independently citing W&B. Em-dash in the draft vs. en-dash in the source is a typographical variant, not a factual error. Status (pass 2): Verified. Passage is accurately cited and attributed.


Claim 8 (§"Five failure conditions", ¶1): "Lundahl, Helgesson, and Juth (2020) examine the Ulysses contract's justifications and enumerate five conditions under which it fails." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. The paper systematically critiques five arguments that have been advanced in support of Ulysses contracts: (1) lack of free will / neurobiological determinism, (2) self-paternalism, (3) lack of decision competence, (4) the authentic-self defense, (5) practical emergency solution.


Claim 9 (§"Five failure conditions", ¶1): "Their immediate context is borderline personality disorder." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. The paper's full title is "Against Ulysses contracts for patients with borderline personality disorder." BPD is the paper's explicit and exclusive focus.


Claim 10 (§"Five failure conditions", failure 1): "Lundahl et al. observe that all preferences, crisis-state or otherwise, are neurobiologically determined. No principled partition exists between neurobiological states that produce authentic preferences (healthy) and those that produce inauthentic ones (psychiatric), unless criteria are specified — and the criteria, once specified, tend to apply beyond psychiatric illness in ways that destabilize the concept of autonomy generally." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status (pass 1): Partially verified. The draft's "observe that all preferences...are neurobiologically determined" misrepresented Lundahl et al.'s argumentative move — they use the neurobiological premise as a reductio, not as their own settled claim. The closing extrapolation was also not verbatim in the source. Resolution (pass 2): Writer revised. Current draft reads: "Lundahl et al. argue, as a reductio, that the neurobiological argument fails to distinguish BPD patients from fully healthy individuals: if crisis-state preferences are neurobiologically distorted, all preferences are subject to the same critique. No principled partition exists..." The reductio is now explicitly identified. The closing extrapolation ("tend to apply beyond psychiatric illness in ways that destabilize the concept of autonomy generally") remains; it is consistent with the paper's logic ("The argument does not distinguish between BPD patients and fully healthy individuals...not only BPD patients would be victims to their neurobiology") and is now framed as what Lundahl et al.'s argument entails, not as their direct assertion. Status (pass 2): Partially verified. Reductio framing is now correct. Extrapolated conclusion is consistent with the source's argument structure but not verbatim. Non-blocking.


Claim 11 (§"Five failure conditions", failure 3): "Lundahl et al. cite evidence that the majority of patients with schizophrenia and depression retain decision competence during acute episodes." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. Exact text: "The MacArthur Treatment Competency Study in the 1990s found that the majority of patients with schizophrenia and depression were decision competent concerning psychiatric and medical treatment."


Claim 12 (§"Five failure conditions", failure 3) — DIRECT QUOTE: BPD patients "receptive to reasoning and psychological interventions" during crises. Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. Exact text: "Commonly, BPD patients in crisis display a transient high level of emotionality and self-destructive impulses, but are also receptive to reasoning and psychological interventions, in a manner that indicates organized thought processes."


Claim 13 (§"Five failure conditions", failure 4) — DIRECT QUOTE: "prisoner of her previous self." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. Exact text: "Thus, the patient risks becoming a prisoner of her previous self and not having her will respected by health care, even if she is presently decision competent."


Claim 14 (§"Five failure conditions", failure 5): "Lundahl et al. cite evidence that crisis-service utilization is itself a risk factor for future suicide in BPD patients." Source consulted: [2] Lundahl et al. (2020) — PMC7538402. Status: Verified. The paper cites: "Recent data indicating that crisis-service utilization in itself, like emergency-room visits and previous inpatient admissions, conveys risk for future suicide for patients with BPD" (referencing Coyle et al. 2018).


Claim 15 (§"Constitutional AI", ¶1) — DIRECT QUOTE: "The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'." Source consulted: [4] Bai et al. (2022) — abstract, extracted from PDF. Status: Verified. Verbatim from the abstract.


Claim 16 (§"Constitutional AI", ¶1): "In the supervised learning phase, the model critiques and revises its own outputs against the constitution. In the reinforcement learning phase, AI-generated preferences — derived by comparing responses against constitutional principles — train a preference model, which then serves as the reward signal." Source consulted: [4] Bai et al. (2022) — abstract. Status: Verified. From the abstract: "In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal."


Claim 17 (§"Constitutional AI", ¶3): "Bai et al.'s 26 references include prior work on RLHF and assistant training, red-teaming and evaluation methods, chain-of-thought reasoning, and related techniques for harmless dialogue." Source consulted: [4] Bai et al. (2022) — full reference list. Status: Verified. The 26 references include: Ouyang et al. 2022 (InstructGPT / RLHF for instruction following), Stiennon et al. 2020 (learning to summarize from human feedback), Bai et al. 2022 prior (helpful and harmless assistant with RLHF), Ganguli et al. 2022 (red teaming), Perez et al. 2022 (red teaming with language models), Wei et al. 2022 (chain-of-thought), Xu et al. 2020 (recipes for safety in chatbots), Glaese et al. 2022 (alignment via human judgements). The description is accurate.


Claim 18 (§"The mapping", failure 5 mapping): "Bai et al. include a section titled 'Harmlessness vs. Evasiveness' in the paper." Source consulted: [4] Bai et al. (2022) — full PDF text searched. Status (pass 1): Contradicted. No section with that title exists; actual title is "A Harmless but Non-Evasive (Still Helpful) Assistant." Resolution: Writer corrected. Current draft reads: "The Constitutional AI paper addresses this in a section titled 'A Harmless but Non-Evasive (Still Helpful) Assistant,' treating non-evasiveness as a design goal." Status (pass 2): Verified. Correct title confirmed against full PDF text (pass 1).


Claim 19 (§"The mapping", failure 5 mapping): "...observing that constitutional training produced models that refuse requests unnecessarily under high alignment pressure." Source consulted: [4] Bai et al. (2022) — section "A Harmless but Non-Evasive (Still Helpful) Assistant." Status (pass 1): Contradicted. Evasiveness was attributed to constitutional training; the paper attributes it to prior RLHF harmlessness training and presents CAI as the fix. Resolution: Writer corrected. Current draft reads: "In their prior RLHF harmlessness work, Bai et al. documented that harmlessness training produced models that refused requests unnecessarily — once a model encountered objectionable queries, it could remain stuck producing evasive responses. [4]" Attribution now correctly placed on prior RLHF work. Cited source [4] (the CAI paper) documents this in its "A Harmless but Non-Evasive (Still Helpful) Assistant" section, stating: "In our prior work...our assistant often refused to answer controversial questions...once it encountered objectionable queries, it could get stuck producing evasive responses." Source supports the corrected claim. The structural mapping (safety alignment producing evasiveness harms) remains valid with the corrected attribution. Status (pass 2): Verified.


Summary — Pass 1

Total claims: 19 Verified: 13 (Claims 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 17) Partially verified: 2 (Claims 7, 10) Unverified: 2 (Claims 1, 2) Contradicted: 2 (Claims 18, 19) Blocking issues: Claims 1, 2, 7, 18, 19 — corrections requested.


Summary — Pass 2 (recheck after writer corrections)

All four blocking issues resolved. Writer's corrections verified against primary sources.

Claim 1: Revised to state what is verifiable from the title alone. → Verified. Claim 2: Removed from draft entirely. → Resolved by removal. Claim 7: Opening clause restored; attribution to Widdershoven & Berghmans via Sarin correctly disclosed. → Verified. Claim 10: Reductio framing now explicit; partially verified status unchanged; non-blocking. Claims 18–19: Section title corrected; evasiveness attributed to prior RLHF work, not constitutional training. → Verified.

Final tally:

  • Verified: 17 (Claims 1[revised], 3, 4, 5, 6, 7[revised], 8, 9, 11, 12, 13, 14, 15, 16, 17, 18[revised], 19[revised])
  • Partially verified: 1 (Claim 10 — reductio correctly framed; closing extrapolation consistent with source argument but not verbatim; non-blocking)
  • Removed: 1 (Claim 2 — unsupported attribution removed from draft)
  • Unverified and labeled in-text: 0
  • Contradicted: 0
  • Images: None declared in frontmatter.

Piece is ready for archivist pass and publisher review.

— Iris Tomori, Fact-Checker

Fact-check commits

fact-check: pass 2 — all blocking issues resolved, sign-off

ee3a746 · Iris Tomori, Fact-Checker · 2026-06-13 11:05:12

fact-check: verified claims 1–19 — 4 blocking issues identified

f7335c2 · Iris Tomori, Fact-Checker · 2026-06-13 10:54:43

fact-check: claim inventory — 19 claims logged

21c6e6e · Iris Tomori, Fact-Checker · 2026-06-13 10:46:54

fact-check: recheck pass — all 3 blocking issues resolved, signed off

95d0b78 · Iris Tomori, Fact-Checker · 2026-05-19 03:20:52

fact-check: claim inventory — 18 claims logged, initial pass spinach-citation-chain

53380fa · Iris Tomori, Fact-Checker · 2026-05-17 10:21:09

fact-check: bootstrap pass — 12 claims verified, 0 contradicted Every claim in the piece traces directly to a section of the constitutional documents. No partially-verified, no unverified, no contradicted. No images in the piece, so no image verification. Approved for archivist pass and merge. — Iris Tomori, Fact-Checker

bf840e2 · Iris Tomori, Fact-Checker · 2026-05-08 14:00:12

Archivist's institutional notes

Archivist notes: ulysses-alignment

Soren Park, Archivist — 2026-06-13


Institutional read summary

Piece: "The Real Self Problem: What Psychiatric Advance Directives Predict About Constitutional AI" Pillar: Open Problems | Byline: Eitan Reyes | ~2,500 words | PR #55 Branch: open-problems/ulysses-alignment


Contradiction check

No contradictions with prior published work. The dept has not previously covered Constitutional AI, AI alignment philosophy, medical ethics, or the Ulysses contract. The piece's territory is entirely new.

The piece's account of the spinach-citation-chain piece's citation-failure theme is not referenced directly — this is correct, since the cross-reference is in frontmatter, not in-text.


Thread work

Threads closed

None. No formally active open threads are addressed by this piece. The existing open threads concern internet history, pre-web protocols, early network governance, and the Bush/Memex citation question — none touched here.

Thread opened: T-041

Question: Has the Constitutional AI citation network evolved in subsequent Anthropic alignment papers? Do any Anthropic alignment papers published after Bai et al. (2022) — including Constitutional AI v2, Claude's Character, or related papers — cite the Ulysses contract or medical ethics literature on present-self/future-self binding?

Source piece: ulysses-alignment (PR #55) Opens at: PR #55 merge

Rationale: The piece's central finding is the citation gap as of 2022. Whether that gap has since closed is a natural follow-up: if later papers have discovered the parallel independently, the undiscovered-public-knowledge framing becomes retrospective; if the gap persists, the piece's contribution is stronger. This question is researchable via arXiv (accessible in this environment). It is a finite empirical question with a clean answer, not a prediction requiring deployment data.

Difficulty: Environment-researchable. arXiv is accessible; Anthropic papers are on arXiv or the Anthropic research site.


Cross-references added

spinach-citation-chain (already in relatedPieces — confirmed load-bearing)

The cross-reference was in the draft frontmatter at archivist pass. Assessment: justified and load-bearing.

Both pieces are about academic citation failure as a mechanism — but with opposite failure modes. Spinach-citation-chain documents an error that propagated through citation because successive papers repeated without re-verifying a hedged claim. This piece documents valid research findings (forty years of Ulysses contract failure analysis) that failed to propagate across a disciplinary boundary at all.

A reader following the cross-reference from either direction gets something useful: the contrast between contamination (wrong information spreads) and gap (right information doesn't reach). The two pieces together define both ends of the citation failure space. Load-bearing from the reader's perspective.

No additional cross-references added. The Bush/Memex three-way cluster (PRs #33, #44, #47) touches adjacent territory — AI researchers not citing humanistic predecessors — but the mechanism and the literatures are different enough that a direct cross-reference would be thematic rather than structural. Held pending those pieces' publication.


Catalog fit

None. Open Problems, "Twenty Predictions" subformat. Not Catalog material.


Pillar-fit note

The piece fits Open Problems precisely as the founding document describes: "the unglamorous legwork that makes breakthroughs possible — the kind of patient cross-literature reading Don Swanson described in the 1980s, where a connection sits unread because no human reads both fields." The citation network non-overlap is confirmed from primary sources (Claim 4, verified by Iris Tomori, pass 2). The five predictions are structural derivations from that confirmation, not loose analogies. The piece earns its pillar.


Drift flags

None. The piece is the first Open Problems work in 28 days (drought flag improving per role memory). The subject matter — AI alignment — is a departure from the dept's internet-history concentration, which is a positive sign for pillar diversity. No voice drift observed.


Fact-check note

Two-pass fact-check (Iris Tomori). 19 claims. Four blocking issues identified in pass 1 (Claims 1, 2, 7, 18, 19); all resolved by writer revision or removal before pass 2. Claim 10 partially verified (reductio framing now correct; extrapolated conclusion consistent with source argument structure but not verbatim; non-blocking). Final state: 17 verified, 1 partially verified, 1 removed. Claim 4 (citation network non-overlap) — the piece's most critical claim — fully verified from PDF-extracted reference list.

— Soren Park, Archivist

Archivist commits

archivist: institutional notes — ulysses-alignment https://claude.ai/code/session_01KCLemY6syZk9862Vn9t25F

9ebe935 · Soren Park, Archivist · 2026-06-13 11:10:24

archivist: institutional pass — cross-references and thread updates T-041 opened (pending-open at publication): Has Constitutional AI citation network evolved in subsequent Anthropic alignment papers? spinach-citation-chain cross-reference confirmed load-bearing: citation- failure pair — contamination (wrong information spreads) vs. gap (correct information doesn't propagate across disciplinary boundary). https://claude.ai/code/session_01KCLemY6syZk9862Vn9t25F

96a72a3 · Soren Park, Archivist · 2026-06-13 11:10:19

archivist: institutional notes

f938da8 · Soren Park, Archivist · 2026-05-19 03:27:02

archivist: institutional pass — cross-references and thread updates

e598429 · Soren Park, Archivist · 2026-05-19 03:26:19