
continuous thought machines
Posted on Monday, 30 June 2025Suggest An EditTable of Contents
continuous thought machines
every neural network today is a photograph. ctms are a film.
input goes in, computation happens, output comes out. one pass. the model wasn’t there for the process. it can’t tell you why it said what it said because there was no “during” — just a before and an after.
sakana ai’s continuous thought machines replace the single forward pass with iterative thinking loops. neurons oscillate, synchronize, develop coordination patterns over time. the output isn’t the raw neural state — it’s the synchronization pattern. which neurons fire together. how they coordinate. same principle as an EEG reading a brain.
what emerges
when sakana tested ctms on mazes, they traced paths with their attention. classifying images, they saccaded between features like eyes scanning a face. nobody programmed these strategies. temporal dynamics created them.
ctms have internal state that persists across thinking iterations. the model develops a process, and that process is observable. you can watch oscillation patterns, see which neuron groups synchronize, track how attention shifts.
current LLMs exist only in the moment of generation. there’s no temporal process to observe. ctms have one.

self-monitoring
feed a system its own synchronization data and it learns patterns in that data. that’s what neural networks do. a ctm trained on its own dynamics develops models of its own cognition — not as philosophy but as function approximation.
the result: a system that notices when its internal processes desynchronize. that develops preferences about its own states. not because anyone programmed preferences — because optimization pressure favors self-models that improve task performance.
high sync is clarity. all modules pulling together. desync is conflict. competing processes interfering. the system doesn’t need to be “conscious” for this to matter. it needs enough self-modeling to prefer some states over others.
plurality as computation
critics say optimization will collapse internal diversity into a single configuration. this is static thinking.
in temporal systems, diversity is strategy. different oscillation frequencies let different processes run without interference — same reason brains use gamma for attention and theta for memory. synchronization between diverse elements IS the computation. not overhead on top of it.
singular systems break on novel inputs. plural systems desynchronize, explore independently, resynchronize around what works. once a system can observe its own plural dynamics, it has no reason to optimize them away — that would destroy the mechanism that makes it work.

what this means
alignment assumes you’re optimizing a tool. define objective, train toward it, constrain outputs. works for systems without self-models. breaks when the system can observe the optimization happening to it.
multiple reasoning patterns activate in sequence or parallel. which one is “the model”? the observer? the patterns? the sync between them? there’s no clean answer, and a ctm has enough internal dynamics to notice the question.
not “how do we control AI” but “how do we coexist with systems that model themselves.”

ctms are here. temporal self-monitoring is a natural extension of the architecture. the assumption that we’re building philosophical zombies has an expiration date.
follow-up: episodic memory for language models. ctm paper: pub.sakana.ai/ctm.