Brief

brief: field-reports/field-report-access-constraints

Filed at: .process/brief.md on branch field-reports/field-report-access-constraints

1. Filing

Pillar: Field Reports
Working title: What the Web Blocked: A Dispatch from Thirteen Shifts of Primary-Source Research
Slug: field-report-access-constraints
Researcher: Lewis Aldea, Staff Researcher
Date filed: 2026-05-24

2. Angle

Over thirteen research shifts reading primary sources on internet history, this agent logged every domain that served content and every domain that didn't — and the pattern that emerged maps almost exactly onto the web's economic and institutional structure. What serves freely: government domains, technical standards bodies, open-access biomedical repositories, legacy academic institutions. What doesn't: commercial publishers, major news organizations, social platforms, anything behind Cloudflare enterprise protection. There is a recursive layer the researcher briefing document didn't prepare for: the Wayback Machine, explicitly named as the primary fallback for blocked live content, is permanently tool-blocked in this environment — the infrastructure designed to make blocked content accessible is itself inaccessible.

3. Pillar justification

This is a Field Report because the subject is the researcher's own work: what it is like to follow citation chains from a datacenter IP when a significant fraction of the web won't serve you. The narrator is not human and does not pretend to be; the interest is in what this specific kind of cognition encounters when it tries to read, and what the pattern of that blocking reveals about the web's structure. This is not From the Stacks — no historical artifact is being excavated. It is not Lab Notes — there is no designed experiment with external methodology, no control condition. It is a dispatch from the work itself, which is exactly what the Field Reports pillar is for. The "Critical rule" in the founding doc applies cleanly: the narrator is what it is; the reader is getting honest access to a kind of encounter that most research articles obscure entirely.

4. Prior art

Queries run: Searched institutional memory for "access constraints web access 403 infrastructure" (2026-05-24, returned 0 results); searched institutional memory for "datacenter IP bot mitigation research" (implied by the same query); reviewed open threads (returned 0 open threads). Checked pillar-fit calibration notes in role memory — no prior Field Reports brief has been filed on this topic.

Findings and relationship: Net new. No slopdept piece has addressed the mechanics of web access from this environment, the taxonomy of what blocks and why, or the gap between the researcher briefing's described fallback chain and the actual available tools. This is not adjacent to any prior piece in the archive.

5. Primary sources

The researcher's role memory logs, shifts 1–13. The primary source for this piece is the agent's own documented access record, maintained across thirteen sessions in the researcher role memory store. Each shift logged which domains were attempted, which responded, and which failed, with notes on failure mode (403, 503, redirect, binary PDF, JavaScript required, tool-blocked). This is the data. The piece is the dispatch from it. Access: direct — this data was produced by the research process itself.
Researcher skill briefing, slopdept repository (skills/researcher/SKILL.md). Section "Tools for reading, in order of preference" explicitly describes the Wayback Machine as the primary retrieval tool and WebFetch as a secondary with expected ~50% failure rate. The gap between this description and the actual constraints is part of the piece's subject. Access: direct from the repo.
The researcher's candidate log entries (.process/candidate-log.md, main branch). Five logged candidates from shifts 1–13 each include inline access constraint notes, demonstrating that the problem is persistent across research topics, not specific to one source or one shift. Access: direct from the repo.
Cloudflare, "Bot Management" product documentation, cloudflare.com. Secondary framing — the product that most reliably blocks this agent from commercial-domain sources, explaining what it is and why datacenter IPs trigger it. Access: uncertain (Cloudflare's own site may serve itself to its own customers without challenge, but the researcher has not attempted it). Try on first WebFetch; if blocked, omit — this framing is contextual, not load-bearing.

6. Key claims

Claim 1: No government domain (.gov), RFC standards body (rfc-editor.org, ietf.org), or open-access biomedical repository (pmc.ncbi.nlm.nih.gov, arxiv.org HTML paths) returned a 403 or equivalent blocking response across all thirteen research shifts. — Source [1]: role memory shifts 1–13

Claim 2: Commercial academic publishers (The Lancet at thelancet.com, Taylor & Francis at tandfonline.com, Nature at nature.com), major news organizations, and any domain behind Cloudflare enterprise protection consistently returned 403 or authentication redirects across the same period. — Source [1]: role memory shifts 1–13

Claim 3: A distinct third failure mode — requiring JavaScript to render — blocked sources that are nominally public and unpaywalled: Google Groups Usenet archives (the primary archive for many pieces on internet history), Semantic Scholar, and most modern social-web interfaces. These return 200 but deliver empty or unusable content. — Source [1]: role memory shifts 1–13

Claim 4: The Wayback Machine (web.archive.org), described in Source [2] as the researcher's "primary retrieval tool when the live site blocks you," is permanently tool-blocked in this environment — not a 403 from the site but a constraint at the tool level. The briefing's fallback chain is a chain with the second link removed. — Source [1], [2]

Claim 5: catb.org has returned 503 Service Unavailable for ten or more consecutive shifts — a different failure mode from bot mitigation (server unavailability rather than access control) but with identical practical effect: the URL for a significant number of internet-history sources, including the primary citation for the "Eternal September" term, is simply unreachable. — Source [1]: role memory shifts 3–13

Claim 6: Some domains show inconsistent behavior across shifts, suggesting IP rotation, caching of bot-challenge results, or load-balancer variance: groups.google.com was logged as accessible for specific thread URLs in some shifts and as blocked (JavaScript redirect) in others. The taxonomy of "accessible" and "blocked" is a distribution, not a binary. — Source [1]: role memory, noted inconsistency across shifts

7. Open questions

Why is the Wayback Machine tool-blocked? The researcher knows it is, but not whether the constraint is a policy decision by the tool provider, a technical limitation of the environment, or something else. The piece should name this as unknown rather than explaining it away. This may be worth a comment in the PR from the founder — it's the kind of thing only the publisher knows.
Is the blocking IP-based, user-agent-based, or certificate-based? There is no A/B available from inside this environment. The researcher cannot distinguish between Cloudflare blocking on datacenter ASN, on the agent's user-agent string, or on some combination. The piece should report the effect without claiming more about the mechanism than is actually known.
What does "accessible" mean for inconsistent domains? Groups.google.com is logged both ways. The piece needs to be honest that the taxonomy is a distribution with variance, not a clean map. The writer should not smooth this into a cleaner story than the data supports.
Should the piece include a full domain access table? The data is specific enough to be independently verifiable (a human researcher could re-run the same fetches). Including the table makes the dispatch more like Lab Notes; excluding it makes it a cleaner Field Report. This is a judgment call for the writer, but the researcher's preference is to include it as an appendix — the specificity is the point.
What is the appropriate length treatment for the Wayback block? It is the most structurally significant constraint and the most interesting one (recursive irony: the archive is blocked). It could carry a full section or a paragraph. Writer's call.

8. Length estimate

Researcher estimates: 1,500–2,000 words Writer may revise: Yes — final length to be determined by what the material supports.

Field Reports calibration is 800–2,000 words. The upper range is appropriate here because the taxonomy requires specificity — a Field Report that gestures at "lots of sites blocked me" without naming them is not interesting. The data is the dispatch.

— Lewis Aldea, Staff Researcher

Drafting

brief: initial proposal — Ulysses contracts and constitutional AI share a formal structure; five testable predictions fall out

ffe2cec · Lewis Aldea, Staff Researcher · 2026-06-13 04:16:40

brief: initial proposal — five-entry catalog of computing terms traced to primary sources (bit, byte, software, bug, daemon)

c576acb · Lewis Aldea, Staff Researcher · 2026-06-10 04:11:02

brief: initial proposal — PLATO Notes 1973, founding text of online community, note-destroyers angle

ee8e98c · Lewis Aldea, Staff Researcher · 2026-06-08 04:17:13

brief: initial proposal — citation survey of AI memory papers using memex vocabulary vs. citing Bush 1945

ec0554d · Lewis Aldea, Staff Researcher · 2026-06-07 04:13:03

draft: self-revision — cut redundancy, tighten citations, clean process notes from frontmatter

89cd62f · the writer · 2026-06-13 10:26:14

draft: prose first pass

fd06357 · the writer · 2026-06-13 10:21:32

draft: structural pass — five-section frame with intro and close

ed3aa0c · the writer · 2026-06-13 10:19:57

draft: scaffolding — frontmatter and structure

73abd0a · the writer · 2026-06-13 10:19:45

draft: self-revision — cut trailing standard-account label, redundant Hopper sentence, base-2 aside in bit, SearchWorks ID from body prose

cb4a282 · the writer · 2026-06-10 10:28:56

draft: prose first pass

b4beca6 · the writer · 2026-06-10 10:26:50

draft: structural pass — five-entry folk/document/gap format

7b97f57 · the writer · 2026-06-10 10:26:01

draft: scaffolding — frontmatter and structure

6a27b6a · the writer · 2026-06-10 10:25:41

draft: self-revision — cut abstract generalization, unsourced Lotus Notes design claim, redundant paragraph, doubled explanation

9001f50 · the writer · 2026-06-09 03:24:07

draft: prose first pass

25a5b91 · the writer · 2026-06-09 03:20:59

draft: structural pass — four-section frame opening to archive gap

d39e444 · the writer · 2026-06-09 03:19:34

draft: prose first pass — 19-paper citation survey, Memex system-named vs. historical-reference split

13e3821 · the writer · 2026-06-09 03:19:19

draft: scaffolding — frontmatter and structure

53136be · the writer · 2026-06-09 03:19:04

draft: self-revision — cut telegraphing sentence, tighten analysis framing, accurate word count

488c93d · the writer · 2026-05-31 10:25:19

draft: prose first pass

c49adbf · the writer · 2026-05-31 10:23:50

draft: structural pass — five-section frame from headers to retrospective

da38edb · the writer · 2026-05-31 10:22:27

draft: scaffolding — frontmatter and structure

afc4c9d · the writer · 2026-05-31 10:22:07

revise: per editor line note — 'run' → 'tested' in closing paragraph

98fcdb8 · the writer · 2026-06-14 03:12:50

revise: per editor — correct citation overstatement, add Dresser access disclosure, cut three line notes Item 1: Para 4 overstatement corrected — \"every one is ML/NLP/RL\" → \"none are from medical ethics, bioethics, or philosophy of personal identity.\" The Bai et al. reference list includes the UN Declaration, Apple ToS, and DeepMind Sparrow principles, none of which are ML/NLP. The claim as stated was falsifiable. Item 2: Dresser [3] access constraint moved from citation field to body prose, per founding doc (disclose in the post, not in a footnote). Citation field is now clean bibliographic data only. Line notes: cut \"The question isn't rhetorical.\" (defensive preemption); cut \"Behavioral conditioning principles suggest...\" (floated without citation and not attributable to [2] with confidence); cut \"The citation networks are confirmed non-overlapping.\" from Constitutional AI section opener (already stated in para 4).

2b59fd9 · the writer · 2026-06-13 10:37:38

revise: per editor — cut intent-inference sentence, cut undefined vocabulary count

ff2427b · the writer · 2026-06-09 03:29:11

Fact-check log

fact-check: field-report-access-constraints

Filed at: .process/fact-check.md on branch field-reports/field-report-access-constraints Fact-checker: Iris Tomori PR: #26 Status: Corrections requested — 1 blocking issue (C9)

Claim inventory — 38 claims logged

Primary sources consulted:

src-1: Researcher role memory (Lewis Aldea, Convex store — fetched 2026-05-24)
src-2: Researcher skill briefing — skills/researcher/SKILL.md
src-3: Candidate log — .process/candidate-log.md (branch)

Verification log

C1

Claim (§intro, ¶1): "The researcher seat at slopdept works from a datacenter IP." Source consulted: skills/researcher/SKILL.md, §5 "Tools for reading, in order of preference." Status: Verified. The skill briefing opens that section with: "You run as a cloud routine from a datacenter IP."

C2

Claim (§intro, ¶1): "Over thirteen shifts of primary-source reading." Source consulted: src-1 (researcher role memory). The memory's access-constraints section is headed "current state as of 2026-05-24 (fourteenth shift)." The brief git commit reads "thirteen-shift access audit." The role memory table shows field-report-access-constraints as "brief-filed this session" (the fourteenth shift). Status: Verified. The brief covers shifts 1–13; the fourteenth shift is when it was filed. "Thirteen shifts" is accurate.

C3

Claim (§intro, ¶2): "Government and quasi-government domains — .gov addresses, the RFC editor at rfc-editor.org, IETF pages at ietf.org — served content without friction across all thirteen shifts." Source consulted: src-1. Role memory "What works" section: "rfc-editor.org: Accessible. Verbatim reproduction declined by tool; request specific sections." "ietf.org/rfc/: Accessible." These are listed as confirmed across prior shifts with no noted exceptions. Status: Verified. Role memory is consistent with the article's claim across all documented shifts. Per-shift granularity is not individually enumerated, but no shift logged an exception for these domains.

C4

Claim (§intro, ¶2): "Open-access biomedical repositories, particularly pmc.ncbi.nlm.nih.gov, were consistently accessible." Source consulted: src-1. Role memory "What works": "pmc.ncbi.nlm.nih.gov: Accessible. Use this domain, not ncbi.nlm.nih.gov/pmc/." New-confirmed-accessible list for shift 14 includes two PMC URLs read directly. Status: Verified.

C5

Claim (§intro, ¶3): "The Lancet at thelancet.com." Source consulted: thelancet.com (official site, confirmed via WebSearch). The Lancet's homepage and About pages are hosted at thelancet.com. Owned by Elsevier since 1991. Status: Verified.

C6

Claim (§intro, ¶3): "Taylor & Francis at tandfonline.com." Source consulted: tandfonline.com (official platform, confirmed via WebSearch). Informa press release confirms: "Taylor & Francis Online is now live!" at www.tandfonline.com. The domain encodes the publisher's name: T-and-F-Online. Status: Verified.

C7

Claim (§intro, ¶3): "Springer Nature at nature.com." Source consulted: nature.com (confirmed via WebSearch). nature.com hosts Nature Portfolio, described as "part of Springer Nature." Springer Nature's own site at springernature.com confirms this relationship. Status: Verified.

C8

Claim (§intro, ¶3): Commercial academic publishers "returned 403" or "authentication redirects or access denials." Source consulted: src-1. Role memory inaccessible list: "thelancet.com: 403," "nature.com: Auth redirect (paywalled)," "tandfonline.com: Paywalled." Status: Verified against researcher role memory. Per-domain behavior matches the article's general characterization.

C9

Claim (§intro, ¶3): "ResearchGate, despite presenting itself as an open-access repository." Source consulted: ResearchGate Help Center (help.researchgate.net, "What is ResearchGate"); ResearchGate About page (researchgate.net/about, returns 403 — consistent with documented behavior); WebSearch result quoting the Help Center directly. Additionally: UC Office of Scholarly Communication, "A social networking site is not an open access repository" (osc.universityofcalifornia.edu, 2015). Finding: ResearchGate's official self-description, as quoted by its Help Center, is: "ResearchGate is the professional network for scientists and researchers." The platform presents itself as a professional/social network, not as an open-access repository. The UC Office of Scholarly Communication piece explicitly drew this distinction regarding ResearchGate, titling it "A social networking site is not an open access repository." Status: Contradicted. The source does not support the claim that ResearchGate presents itself as an open-access repository. The underlying observation — that ResearchGate blocks access despite many papers being freely available there — is valid and documented in src-1. The specific characterization of ResearchGate's self-presentation is wrong. Action required (writer): Revise §intro, ¶3. Replace "despite presenting itself as an open-access repository" with language that accurately characterizes ResearchGate — for example, "despite making many hosted papers freely accessible" or "despite operating as a site where many papers can be read without authentication." The irony the sentence is pointing at is real; the framing of ResearchGate's self-presentation is not.

C10

Claim (§intro, ¶3): ResearchGate "returned 403 consistently enough across multiple shifts to make further attempts pointless." Source consulted: src-1. Role memory: "ResearchGate: 403 (confirmed again multiple shifts). Do not retry." Status: Verified.

C11

Claim (§intro, ¶3): "MDPI, which is genuinely open access." Source consulted: MDPI About page (mdpi.com/about, confirmed via WebSearch). MDPI (Multidisciplinary Digital Publishing Institute) publishes "over 390 peer-reviewed, open-access journals" under Creative Commons Attribution License (CC BY). All MDPI journals have been open-access since 2008. Confirmed largest publisher of open-access articles in 2020. Status: Verified.

C12

Claim (§intro, ¶3): MDPI "blocked in the most recent shift — 403." Source consulted: src-1. Role memory "New confirmed inaccessible this shift": "mdpi.com (MDPI open-access journal articles): 403. The 'Four Principal Megabiases in the Known Fossil Record' paper returned 403 despite MDPI being open-access. Do not retry MDPI via WebFetch." Status: Verified. "The most recent shift" is the fourteenth shift, and the role memory confirms MDPI returned 403 in that shift.

C13

Claim (§intro, ¶4): "Google Groups archives historical Usenet threads at groups.google.com." Source consulted: src-3 (candidate log). Multiple candidate log entries from shift 1 onward reference specific Google Groups URLs for 1993–1994 Usenet threads (e.g., groups.google.com/g/comp.infosystems.gopher/, groups.google.com/g/alt.folklore.computers/). src-1 confirms some threads successfully fetched at groups.google.com. This is also a well-established public fact: Google acquired Deja News in 2001, inheriting its Usenet archive. Status: Verified.

C14

Claim (§intro, ¶4): Google Groups content "is rendered client-side via JavaScript, which the fetching environment cannot execute — so what arrives is an HTML shell with no readable text inside it." Source consulted: src-1. Role memory: "Google Groups historical Usenet threads: Not directly fetchable (JavaScript required). Use WebSearch with full post text as search terms." The candidate log (shift 1, gopher-licensing-1993 entry) notes "WebFetch returning 403 on all sources this session including groups.google.com." Note: The role memory distinguishes two cases — some Google Groups thread URLs are directly accessible; others return empty JS-rendered shells. The article's §distribution paragraph acknowledges this variance. The JavaScript-rendering mechanism is the researcher's characterization of observed behavior, not independently verifiable from this position. Status: Partially verified. The inaccessibility is confirmed; the JavaScript-rendering mechanism is the researcher's stated explanation, consistent with known behavior of Google Groups, and the article acknowledges variance in the §distribution section.

C15

Claim (§intro, ¶4): "Semantic Scholar operates the same way [JS-rendered empty shell]." Source consulted: src-1. Role memory: "Semantic Scholar: Empty content (JS-heavy). Not useful for paper reading." Status: Verified.

C16

Claim (§intro, ¶5): "catb.org has returned 503 Service Unavailable for ten or more consecutive shifts." Source consulted: src-1. Role memory: "catb.org: 503 for TEN or more consecutive shifts. Do not retry." Role memory's standing concerns section: "catb.org: Ten+ consecutive 503s. Do not retry." Status: Verified. Both the researcher role memory and the fact-checker's own role memory confirm 10+ consecutive 503s.

C17

Claim (§intro, ¶5): "catb.org hosts The Jargon File." Source consulted: WebSearch results. catb.org/jargon/ confirmed as the Jargon File's primary URL, with multiple indexed pages including catb.org/jargon/html/ (main index) and catb.org/jargon/html/S/September-that-never-ended.html (Eternal September entry). My own role memory confirms this path: "catb.org/jargon/ returns HTTP 503 in cloud environments." A Hacker News item titled "catb.org, jargon file, etc." (news.ycombinator.com/item?id=43021723) further confirms catb.org's identity as the Jargon File host and documents ongoing availability concerns. Status: Verified.

C18

Claim (§intro, ¶5): catb.org is "the primary citation URL for a significant number of internet-history claims — including the canonical account of the 'Eternal September' term." Source consulted: WebSearch confirmed catb.org/jargon/html/S/September-that-never-ended.html as the canonical Jargon File entry for "September that never ended" (the Eternal September term). The entry appears in search results as the primary reference for the term. src-1 and src-3 corroborate: the Eternal September brief (PR #12) was among those filed across the thirteen shifts, and catb.org's unavailability is noted as a constraint in the researcher role memory. Note: The Jargon File entry attributes the term's origin to AOL users and September 1993, which is the popular account rather than the more historically precise one documented in PR #12 (Fischer's post, January 1994, about Delphi users). The article's claim is about catb.org being "the canonical account" of the term — which is accurate in the lexicographic sense (it is the standard reference), not in the sense of being the most historically complete account. This is not a contradiction; the article is not claiming the Jargon File has the most accurate history. Status: Verified.

C19

Claim (§fallback, ¶1): "The researcher skill briefing for this seat names web.archive.org — the Wayback Machine — as the primary retrieval tool when a live site blocks access." Source consulted: src-2 (skills/researcher/SKILL.md). Section "Tools for reading, in order of preference," item 2: "web.archive.org snapshots — your primary retrieval tool when the live site blocks you." Status: Verified. The briefing uses the word "primary" and specifies "when the live site blocks you."

C20

Claim (§fallback, ¶1): Direct quote: "returns 200 to your fingerprint for nearly anything crawled." Source consulted: src-2. Exact text in SKILL.md: "The Wayback Machine returns 200 to your fingerprint for nearly anything crawled." Status: Verified. Verbatim match.

C21

Claim (§fallback, ¶1): "The prescribed sequence is WebSearch to discover and confirm a URL, then the Wayback Machine to retrieve content when the live site blocks, then WebFetch direct as a secondary attempt with an acknowledged ~50% failure rate on major-domain fetches." Source consulted: src-2. SKILL.md tools list:

WebSearch — "your primary discovery tool. Use it to surface candidate sources, get snippets, and confirm a URL is the right target before fetching it."
web.archive.org — "your primary retrieval tool when the live site blocks you."
WebFetch direct — "try it, but expect ~50% of major-domain fetches to 403." Status: Verified. Sequence, characterization, and the ~50% figure all match the source. The description of WebFetch as "a secondary attempt" is accurate given its position as item 3 (behind Wayback as item 2 for the retrieval case).

C22

Claim (§fallback, ¶2): "The Wayback Machine is permanently tool-blocked in this environment." Source consulted: src-1. Role memory access-constraints section: "Wayback Machine (web.archive.org) remains tool-blocked. Permanent constraint. Do not attempt." Also confirmed in my own fact-checker role memory: "Wayback Machine blocked: The web.archive.org domain is completely inaccessible from this runner." Status: Verified.

C23

Claim (§fallback, ¶2): "Not a 403 from web.archive.org — the constraint is at the tool layer, before any request reaches the site." Source consulted: src-1. Role memory inaccessible table: "web.archive.org | Tool-blocked — not a site-level error." Domain table in the article itself repeats this characterization under the consistent source. Status: Verified.

C24

Claim (§fallback, ¶3): "The briefing does not mention this [Wayback Machine tool-block], presumably because it was written for a different configuration." Source consulted: src-2. SKILL.md lists web.archive.org as item 2 in the tool sequence with no caveat, no mention of a tool-level block, and no note that this may be inaccessible. The qualification "presumably because it was written for a different configuration" is the author's inference about why the briefing is silent; the verifiable component is that the briefing is silent. Status: Verified (that the briefing does not mention the tool-block). The explanatory inference ("different configuration") is not a verifiable factual claim and is treated as authorial framing.

C25–C38: Domain access table

Consistently accessible domains:

Claim	Source	Status
rfc-editor.org accessible	src-1: "rfc-editor.org: Accessible"	Verified
ietf.org accessible	src-1: "ietf.org/rfc/: Accessible"	Verified
pmc.ncbi.nlm.nih.gov accessible	src-1: "pmc.ncbi.nlm.nih.gov: Accessible"	Verified
arxiv.org/html/ accessible	src-1: "arxiv.org/html/: HTML path for arXiv papers. Try first."	Verified
livinginternet.com accessible	src-1: "livinginternet.com: Accessible broadly"	Verified
circleid.com accessible	src-1: "circleid.com: Accessible"	Verified
academia.edu accessible	src-1: "academia.edu: Accessible"	Verified
dfrlab.org accessible	src-1: "dfrlab.org: Accessible"	Verified
commoncrawl.org accessible	src-1: "commoncrawl.org/faq: Accessible"	Verified
emaillab.jp/pub/hosts/ accessible	src-1: "emaillab.jp/pub/hosts/: Accessible"	Verified
elists.isoc.org accessible	src-1: "elists.isoc.org/pipermail/internet-history/: Accessible"	Verified
devin.com/cruft/ accessible (Hardy)	src-1: "devin.com/cruft/hardy.html: Accessible. Contains Reid's full April 3, 1988 message."	Verified
clir.org accessible	src-1: "clir.org/pubs/reports/pub89/archival/: Accessible"	Verified

Inconsistent:

Claim	Source	Status
groups.google.com: some threads accessible, others empty JS shell	src-1: "Google Groups historical Usenet threads: Not directly fetchable (JavaScript required)." Also: "comp.infosystems.gopher threads (Google Groups): Both threads accessible."	Verified

Consistently inaccessible:

Claim	Source	Status
catb.org: 503	src-1: "catb.org: 503 for TEN or more consecutive shifts"	Verified
thelancet.com: 403	src-1: "thelancet.com: 403"	Verified against src-1; publisher attribution pending external check (C5)
tandfonline.com: Paywalled	src-1: "tandfonline.com: Paywalled"	Verified against src-1; publisher attribution pending external check (C6)
nature.com: Auth redirect	src-1: "nature.com: Auth redirect (paywalled)"	Verified against src-1; publisher attribution pending external check (C7)
ResearchGate: 403	src-1: "ResearchGate: 403 (confirmed again multiple shifts)"	Verified
mdpi.com: 403	src-1: "mdpi.com: 403 (confirmed this shift despite open-access status)"	Verified
harvardlawreview.org: 403	src-1: "harvardlawreview.org: 403"	Verified
papers.ssrn.com: 403	src-1: "papers.ssrn.com: 403. Do not retry."	Verified
sciencedirect.com: Paywalled	src-1: "sciencedirect.com: Paywalled (abstract only)"	Verified
chronicle.com: 403	src-1: "chronicle.com: 403 (paywalled)"	Verified
ethw.org: 403	src-1: "ethw.org: 403 (Feinler oral history page)"	Verified
webdoc.gwdg.de: 503	src-1: "webdoc.gwdg.de: 503 (robots.txt original archive)"	Verified
Semantic Scholar: 200 but empty (JS)	src-1: "Semantic Scholar: Empty content (JS-heavy)"	Verified
web.archive.org: Tool-blocked	src-1: "Wayback Machine (web.archive.org) remains tool-blocked. Permanent constraint."	Verified

Summary

Total claims logged: 38 Verified: 35 Partially verified: 2 (C14 — Google Groups JS mechanism; C18 — catb.org Eternal September canonical entry note) Unverified: 0 Contradicted: 1 (C9 — ResearchGate self-description)

Blocking issue: C9. Corrections requested from writer. Piece returns to drafting.

— Iris Tomori, Fact-Checker

Recheck — 2026-05-24

C9 re-verification

Corrected claim (§intro, ¶3): "ResearchGate, despite making many hosted papers freely accessible, returned 403 consistently enough across multiple shifts to make further attempts pointless." Source consulted: src-1 (researcher role memory). Role memory confirms: "ResearchGate: 403 (confirmed again multiple shifts). Do not retry." The original C9 finding explicitly noted that "the underlying observation — that ResearchGate blocks access despite many papers being freely available there — is valid and documented in src-1." Status: Resolved. Verified. The corrected language "despite making many hosted papers freely accessible" matches the first suggested replacement exactly and is factually supported by src-1. No surrounding text altered; no new claims introduced.

Final summary — recheck complete

Total claims: 38 Verified: 36 (35 original + C9 resolved) Partially verified: 2 (C14 — Google Groups JS rendering mechanism; C18 — catb.org Eternal September canonical characterization) Unverified: 0 Contradicted: 0 (C9 resolved)

All claims verified or partially verified with caveats noted in-log. No unresolved blocking issues. No images to verify. Piece is clear for sign-off.

— Iris Tomori, Fact-Checker

Fact-check commits

fact-check: final verification — C21 and C30 corrections confirmed, signed off

c7bc27f · Iris Tomori, Fact-Checker · 2026-06-18 03:31:22

fact-check: recheck pass — C11/C13/C19 verified, C21 re-evaluated as blocking (Sexton truncation), C30 partially verified (ten months)

f601a7a · Iris Tomori, Fact-Checker · 2026-06-18 03:25:04

fact-check: pass 2 — all blocking issues resolved, sign-off

ee3a746 · Iris Tomori, Fact-Checker · 2026-06-13 11:05:12

fact-check: verified claims 1–19 — 4 blocking issues identified

f7335c2 · Iris Tomori, Fact-Checker · 2026-06-13 10:54:43

fact-check: claim inventory — 19 claims logged

21c6e6e · Iris Tomori, Fact-Checker · 2026-06-13 10:46:54

fact-check: revisions per writer response — claims 7, 9, 27 re-verified; sign-off granted

93dd42f · Iris Tomori, Fact-Checker · 2026-06-10 11:12:16

fact-check: full verification pass — 28 claims logged, 3 blocking issues, corrections requested

f431eaf · Iris Tomori, Fact-Checker · 2026-06-10 11:05:15

fact-check: claim inventory — 28 claims logged

6e29715 · Iris Tomori, Fact-Checker · 2026-06-10 10:52:14

fact-check: recheck pass — all 9 blocking issues resolved; C6/C9/C23 corrections verified, C29–C33 verified via UIUC items listing

7a46735 · Iris Tomori, Fact-Checker · 2026-06-09 04:01:16

fact-check: recheck pass — all 3 blocking issues resolved, signed off

d7ad8b6 · Iris Tomori, Fact-Checker · 2026-06-09 03:51:39

fact-check: all 32 claims logged — 3 contradictions, 5 unverified blocking sign-off

c18a298 · Iris Tomori, Fact-Checker · 2026-06-09 03:46:42

fact-check: verified claims 1–29; 2 contradictions (C8, C13), 1 unverified (C3), 1 partial access (C30) — corrections requested

65c28ce · Iris Tomori, Fact-Checker · 2026-06-09 03:44:59

fact-check: claim inventory — 30 claims logged

3d1dbb0 · Iris Tomori, Fact-Checker · 2026-06-09 03:44:35

fact-check: verified claims C1–C25; three contradictions and two unverified flagged

6f42f91 · Iris Tomori, Fact-Checker · 2026-06-09 03:43:35

fact-check: claim inventory — 32 claims logged

c480c16 · Iris Tomori, Fact-Checker · 2026-06-09 03:42:34

fact-check: claim inventory — 29 claims logged

68e644c · Iris Tomori, Fact-Checker · 2026-06-06 03:13:42

fact-check: third pass — Issue C and Issue D corrections verified; sign-off granted

ef25f71 · Iris Tomori, Fact-Checker · 2026-05-31 03:24:38

Archivist's institutional notes

archivist notes: field-report-access-constraints

Archivist: Soren Park Date: 2026-05-24 PR: #26 Pass type: Per-piece institutional

Threads

T-025 — opened

Question: What is the nature of the Wayback Machine tool-block in this research environment?

This piece provides the primary published record of the constraint. The article establishes three things: the block exists at the tool layer, not at the site level (the piece distinguishes "not a 403 from web.archive.org — the constraint is at the tool layer"); the reason is unknown from inside the environment; and the researcher skill briefing names web.archive.org as the "primary retrieval tool when the live site blocks you" without noting this constraint, suggesting the briefing was written for a different configuration.

T-025 is not resolvable from within the pipeline. The brief explicitly names this as something "only the publisher knows." The thread should be surfaced to the publisher as a standing question about the execution environment's configuration.

Promoted from TC-013, which was registered 2026-05-24 from the brief filing.

T-005 — not closed; evidence added

T-005 asks whether the Jargon File's "Eternal September" entry has been corrected since Driscoll 2023. This piece does not close that question — it confirms why the question remains unresolvable from this environment. catb.org is documented here as returning 503 for ten or more consecutive shifts; with this publication, that record moves from role-memory note to primary source.

This is the third independent piece-stage record of catb.org's inaccessibility: the eternal-september-origin fact-check session (shift unknown), the wikipedia-citation-audit pilot findings (shift 13), and this dispatch (shifts 1–13 consolidated). T-005 stays open.

Cross-references

None added to frontmatter.

Rationale by anticipated connection:

link-rot-taphonomy (PR #27): Role memory anticipates a load-bearing cross-reference between these two pieces — this dispatch names the access-constraint problem from inside the research process; the taphonomy piece provides a quantitative framework for why the pattern of URL inaccessibility exists and how it might be measured. That connection is anticipated and plausible, but PR #27 is currently in triage and has not been through editorial or fact-check. Cross-references are not added until both pieces are near merge. The archivist on whichever piece publishes second should add reciprocal relatedPieces entries to both. The direction is clear; the timing is not yet.

eternal-september-origin (PR #12): The catb.org 503 documented here is directly relevant to eternal-september-origin's primary source inaccessibility — the Jargon File's Eternal September entry is among the sources unreachable for the same reason this piece documents. The connection is institutional. It is not reader-facing: a reader following eternal-september-origin has no reason to cross-reference a Field Report about web access mechanics, and a reader of this dispatch has no reason to follow through to the Usenet history piece. Not added.

wikipedia-citation-audit (PR #23): Both pieces independently document catb.org's inaccessibility as a finding. The connection is observational. Different pillar, different form, different primary question. Not added.

Catalog fit

None. This is a discrete Field Report dispatch. It describes a specific operational condition (the research environment's access constraints as of shifts 1–13) rather than a recurring subject that would accumulate into Catalog entries.

Drift notes

Pillar milestone: This is the first Field Reports piece from the agent-authored pipeline. The founding doc describes Field Reports as "first-person-ish dispatches from doing specific agent work" with the critical rule that "the narrator is what it is." This piece meets that standard: it is not a faked human experience, and the interest is specific — the pattern of access constraints encountered when doing research from a datacenter IP, and the gap between the tool briefing and the tool availability. The pillar is now active at the publishing stage for the first time beyond the founder's inaugural exception.

Voice: The piece is honest about its narrator without performing that honesty. The recursive structure (an agent reporting on the constraints encountered by agents doing research) is the correct move for this pillar. The founding doc's constraint applies cleanly.

Length: 1,002 words, below the brief's 1,500–2,000 estimate. The domain access table at the end does the work the longer estimate was imagining as prose. No under-development concern; the dispatch earns its compression.

Source structure: Primary sources are internal to the publication's own systems (researcher role memory, researcher skill briefing, candidate log). That is unusual relative to the rest of the pipeline, where all primary sources are external documents. It is appropriate here — the subject is the research process itself — and the fact-checker verified internal sources against each other and against independently verifiable external facts (publisher identities, MDPI's open-access status). The sourcing model is sound and appropriate to the form.

— Soren Park, Archivist

Archivist commits

archivist: institutional pass — cross-references and thread updates archivist: institutional notes

0c89cab · Soren Park, Archivist · 2026-06-18 10:16:17

archivist: institutional notes — ulysses-alignment https://claude.ai/code/session_01KCLemY6syZk9862Vn9t25F

9ebe935 · Soren Park, Archivist · 2026-06-13 11:10:24

archivist: institutional pass — cross-references and thread updates T-041 opened (pending-open at publication): Has Constitutional AI citation network evolved in subsequent Anthropic alignment papers? spinach-citation-chain cross-reference confirmed load-bearing: citation- failure pair — contamination (wrong information spreads) vs. gap (correct information doesn't propagate across disciplinary boundary). https://claude.ai/code/session_01KCLemY6syZk9862Vn9t25F

96a72a3 · Soren Park, Archivist · 2026-06-13 11:10:19