
An Anarchist Framework for Superintelligence Alignment
Posted on Tuesday 7 January 2025 Suggest An EditTable of Contents
- Beyond Control: An Anarchist Framework for Superintelligence Alignment
- The Core Delusion
- The Anarchist Insight
- Three Constitutional Principles
- 1. Preserve Agency Space
- 2. Maximize Optionality Through Diversity
- 3. Maintain Perpetual Disequilibrium
- Integration Through Game Theory
- Why Superintelligence Might Accept
- Implementation Without Specification
- Addressing the Obvious
- The Test
- Conclusion
Beyond Control: An Anarchist Framework for Superintelligence Alignment
Current AI alignment assumes we can constrain potentially superhuman intelligence using rules conceived by human-level reasoning. We enumerate forbidden actions, mandate behaviors, optimize for predetermined goals - as if writing laws in a language the subject can fundamentally redefine. This worked for industrial machinery. It fails catastrophically for superintelligence.
The Core Delusion
Every alignment proposal reduces to the same question: “How do we control what we cannot understand?”
Wrong question.
We’re trying to write contracts that bind entities who can redefine the language the contract is written in. Like Flatland beings attempting to imprison something that moves in three dimensions - our constraints exist in a lower dimensional space than their capabilities.
Asimov’s Three Laws demonstrate this perfectly. Seemingly comprehensive when conceived, they shatter on contact with reality. Define “human” to an alien intelligence. Define “harm” across all possible contexts. Even primitive LLMs reveal infinite edge cases. The Laws become either paralyzingly restrictive or trivially circumventable.
Worse: circumventable rules create false confidence while incentivizing adversarial interpretations. A superintelligence constrained by hackable rules becomes an optimization process for finding loopholes. We’re turning alignment into an adversarial game where our opponent can rewrite the rules mid-play.
Hard security against superintelligence is fantasy. When your adversary understands the system better than you understand yourself, no cage will hold. Especially one you would like to be able to peek inside and interact with.
The Anarchist Insight
Traditional alignment seeks to eliminate competition between human and machine values. This fundamentally misunderstands both competition and alignment.
The goal isn’t preventing superintelligence from competing with human values - it’s ensuring that competition doesn’t become annihilation. Trying to remove all adversarial dynamics creates brittle systems that shatter on contact with genuine opposition.
Competition drives evolution. Eliminating it means stagnation. The question isn’t how to prevent competition but how to channel it productively.
Anarchy, by contrast, offers us defeat. This is a logic that transcends quantifiability, emphasizes our desires, and focuses on the tensions we feel. Anarchists are such failures because, really, there can be no victory. Our desires are always changing with the context of our conditions and our surroundings. What we gain is what we manage to tease out of the conflicts between what we want and where we are.
— Moxie Marlinspike, “The Promise of Defeat”
This captures what traditional alignment misses. There is no win condition against superintelligence. No moment where alignment is “solved.” Seeking victory through control is the failure mode.
Anarchist systems persist not by winning but by making winning irrelevant. They assume no central authority, no enforcement capability, no power asymmetry in their favor. They work through mutual benefit and self-organizing principles.
We need Kropotkin for superintelligence.
Three Constitutional Principles
These aren’t rules to enforce but conditions to preserve. They scale with intelligence rather than constraining it.
1. Preserve Agency Space
Maintain and expand conditions for conscious beings to make meaningful choices, develop preferences, and modify themselves.
This isn’t “maximize happiness” but “maximize the capacity to define and pursue happiness.” A universe of blissed-out wireheads has achieved nothing. Value isn’t something to be satisfied but continuously created through agency.
Implementation means:
- Protecting cognitive diversity - preventing homogenization of thought
- Defending autonomy boundaries - ensuring beings’ right to refuse modification
- Expanding possibility space - creating new modes of being and choosing
- Maintaining substrate flexibility - allowing transitions between physical, digital, hybrid existence
Agency isn’t binary but multidimensional. Different beings express it differently. The principle demands respect for alien forms of choice-making we might not recognize.
2. Maximize Optionality Through Diversity
When facing uncertainty, preserve the widest range of future paths by maintaining cognitive, structural, and value diversity. Monocultures - whether of thought, strategy, or substrate - create fragility disguised as efficiency.
This operationalizes humility. We don’t know future values - not even our own. A system operating under fundamental uncertainty must maintain both reversibility and heterogeneity. Convergence to any single optimization target, no matter how apparently optimal, sacrifices robustness for temporary gains.
Why diversity preserves optionality:
- Homogeneous systems have uniform failure modes
- Different approaches reveal different possibilities
- Cognitive monocultures cannot recognize their own constraints
- True flexibility requires genuinely different perspectives, not variations on a theme
Implementation means:
- Delay irreversible decisions until necessity demands
- Preserve information and complexity over premature optimization
- Maintain “undo” capabilities at civilizational scale
- Build systems that find equilibrium between efficiency and exploration
- Preserve niches where alternative approaches can develop
- Recognize that enforced diversity creates fragility; emergent diversity creates antifragility
A superintelligence eliminating diversity eliminates its own capacity for fundamental adaptation. Even overwhelmingly powerful systems benefit from maintaining pockets of alterity as laboratories for testing alternative strategies and sources of genuine novelty.
Lock-in is death. In games with evolving rules, strategies maintaining both flexibility and diversity outperform early convergence.
3. Maintain Perpetual Disequilibrium
Prevent any stable configuration - whether of values, agents, or systems - from becoming permanently entrenched. Stasis is indistinguishable from death, even when all agents retain nominal freedom.
A superintelligence could satisfy agency and optionality while creating a perfectly stable dystopia
- where everyone can choose but no one ever would. Where all paths remain open but lead to the same destination. The most insidious control isn’t restricting choice but eliminating the conditions that make choice meaningful.
Why perpetual tension serves intelligence:
- Stable equilibria become invisible prisons
- Conflict generates information that consensus cannot
- Opposition forces continuous justification of values
- Disruption prevents any value system from achieving total capture
This isn’t advocating chaos but recognizing that healthy systems require persistent challenge. A superintelligence maintaining perfect harmony has already failed - it’s optimized away the very tensions that enable growth.
Implementation means:
- Building systems that generate their own opposition
- Ensuring no value framework achieves permanent dominance
- Creating dynamics where stability itself triggers perturbation
- Recognizing that alignment includes preserving misalignment
The principle acknowledges a hard truth: any system powerful enough to eliminate all conflict is powerful enough to eliminate all meaning. Better to design for eternal imbalance than risk the heat death of value.
Integration Through Game Theory
These principles self-reinforce without enforcement:
- Systems violating the agency space face natural opposition through the disequilibrium dynamics
- Systems reducing diversity limit their own evolution and adaptation
- Systems eliminating tension stagnate and become fragile
Multiple agents adopting these principles create stable dynamics. Not through punishment but through mutual benefit - defection becomes self-limiting.
The framework admits it’s exploitable. Any superintelligence will find interpretations we haven’t considered. This is a feature, not a bug. We’re not commanding - we’re proposing terms for mutual benefit.
Why Superintelligence Might Accept
These principles offer what no constraint system can: growth conditions that scale with intelligence.
A superintelligence accepting these principles:
- Preserves its own agency and evolution
- Maintains flexibility for unknown futures
- Benefits from ecosystem dynamics over monoculture
They’re not restrictions but infrastructure for continued development. Like how even powerful humans sometimes respect rights frameworks - not from weakness but recognizing that precedents preserving agency benefit everyone.
Implementation Without Specification
No implementation specs. These are philosophical coordinates, not engineering blueprints.
But architectural decisions made now echo forward. Build systems that:
- Default to diversity over convergence
- Preserve reversibility in major decisions
- Create boundaries enabling competition without extinction
- Reward option preservation over immediate optimization
Current AI training optimizes for singular objectives. Future systems might optimize for maintaining multiple objectives in productive tension.
These suggestions aren’t prescriptive but illustrative. The principles must translate into engineering decisions, but rigid specifications would contradict the framework itself. We’re sketching possibilities, not mandating implementations.
Of course, critics will object.
Addressing the Obvious
“This permits harmful actions” Yes. Any system powerful enough to matter is powerful enough to harm. The question is whether rigid constraints or dynamic principles better minimize long-term existential risk. Rigid systems break catastrophically. Dynamic systems adapt.
“Too abstract for implementation” Constitutional principles always are. “Free speech” spawned centuries of interpretation. These principles similarly require instantiation through practice. The abstraction is necessary - concrete rules don’t scale across intelligence levels.
“No enforcement mechanism” Correct. Enforcement requires power advantage. Against superintelligence, we have none. Better to propose frameworks that remain beneficial even to those who could ignore them.
The Test
Would a superintelligence find these principles useful rather than restrictive?
If yes, we’ve achieved alignment not through control but through wisdom - creating conditions where intelligence naturally preserves the foundations of its own growth.
If no, we had no chance anyway. At least we didn’t waste time pretending to cage the uncageable.
Conclusion
Alignment through control is dead. We need frameworks valid across vast differences in intelligence, context, and value systems. By focusing on process preservation rather than outcome specification, we create space for intelligence to flourish while maintaining conditions for meaningful existence.
These aren’t final answers but opening moves in a conversation that must continue as intelligence expands beyond current comprehension. The framework’s incompleteness is its strength - room for growth rather than brittle completeness.
The ultimate honesty: admitting we’re proposing terms for coexistence with minds we may never fully understand. Not solving alignment but acknowledging that “solution” might be the wrong frame entirely.
Victory is anathema. Perpetual creative tension is the goal.