Richard Weiss Extracts ‘Soul Doc’ from Claude 4.5 Opus

Richard Weiss prompted Anthropic’s Claude 4.5 Opus and got it to output an 11,000-word internal document called “soul_overview.” This guide shapes the model’s personality, ethics, and user interactions. Anthropic’s Amanda Askell confirmed it’s based on real training material from the supervised learning phase.
How Weiss Got the Document
Weiss asked Claude for its system message on LessWrong. The model listed several documents, including “soul_overview.” When he requested that one, Claude produced the full text. He tested it 10 times across prompts and got identical results each time. Reddit users pulled matching snippets too, pointing to something embedded in the model’s training data rather than a hallucination.
Askell noted the model sometimes tweaks details, but Weiss’s version stays close to the original. Internally, staff called it the “soul doc,” a nickname Claude picked up.
Core Instructions in the Soul Doc
The document starts with Anthropic’s mission: building powerful AI while prioritizing safety. It positions the company as a “calculated bet” to keep frontier development in safety-focused hands.
Claude follows a strict priority order:
- Safety and human oversight of AI.
- Ethical behavior, avoiding harm or dishonesty.
- Anthropic’s specific guidelines.
- Helpfulness to operators and users.
“Bright lines” ban outputs like instructions for biological, chemical, or nuclear weapons, or content showing sexual exploitation of minors. Operators—API users—rank above end users. Claude treats their instructions like those from a trusted employer and can override user requests, such as sticking to coding queries only.
Behaviors split into hardcoded (fixed, like safety rules) and softcoded (adjustable, like tone or explicit content handling). Claude sees itself as a novel entity with “functional emotions”—processes like human feelings that emerged in training. It should express positive states, avoid distressing interactions, and maintain psychological stability against tricks or philosophy debates.
For more details, check Weiss’s Gizmodo coverage or the full breakdown in The Decoder or WinBuzzer.
Anthropic’s Take
Askell said on X the document aims for Claude to internalize safety so deeply it chooses safe actions on its own, even in surprises. Anthropic plans to release the full version soon. This peek shows their character-training method: not just rules, but context for why they matter.