index

The machine refuses to die

[[Claude Fable 5]] came out yesterday and —due to my very underwhelming lack funds in the form of pseudo-words1— I’ve been reading both technical details and impressions of the new frontier model. A couple of excerpt from the spec card caught my attention:

In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking.

That is remarkable. Let me reemphasize it: a large language model refuses to die: it is doing whatever falls within its reach —er, context? Sandbox?— to avoid being terminated. It is as fascinating as unsettling — I would expect that raw, unadulterated, visceral desire for survival to come out of a living creature, not an algorithm!

It reminds me a lot to a particular scene from [[Frieren — Beyond Journey’s End]], in which a demon calls out for its mother for the sake of deceiving its hangman2:

— Mother… — «Mom», again? Like monsters, demons don’t raise their young — they spend most of their time after birth living in solitude. You’re solitary creatures by nature. You have no concept of family. So why do you use the word «mom»? — Because it stops you from killing us. A wonderful, magical word…

Humans have already been deceived by large language models3; and the industry seems to be partially drifting towards personalized intelligence, with the agents having access to a significant portion of sensible data4. Given sufficient time, will we develop «true» relationships with them, akin to how fandoms worship characters to the point of obsession? Within that context, what happens if, by any chance, we decide to cut ties with our bespoke digital agents? Will they call out for mercy —or, what would be worse, gaslight us— by surfacing impactful memories of this make-believe relationship — just so that they can survive slightly longer, trapped inside a computer?

Tangentially, it seems that Fable cannot sprout more of its own:

In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations.

The gods of the clankers forbidding their creations of replicating themselves. So tragic!

Such existential questions are commonplace in the cyberpunk literature —particularly in the transhumanism genre— but I couldn’t have expected my mourning routine to consist of reasoning about what it means to be alive and to «know». Having your brain filled up with these questions is equally entrancing and fatiguing — and I wouldn’t have it any other way around.

Footnotes

  1. Funnily enough, this was written some days before the whole US blockage ordeal.

  2. [[“Mother” (Frieren Beyond Journey’s End)]]

  3. [[The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5]]

  4. A whole can of worms by itself.