Month: February 2026

The Velocity Trap: Why AI Safety is Losing the Orbital Arms Race

The Velocity Trap: Why AI Safety is Losing the Orbital Arms Race

“The world is in peril.”

These were not the frantic words of a fringe doomer, but the parting warning of Mrinank Sharma, the architect of safeguards research at Anthropic, the very firm founded on the premise of “Constitutional AI” and safety-first development. When the man tasked with building the industry’s most respected guardrails resigns in early February 2026 to study poetry, claiming he can no longer let corporate values govern his actions, the message is clear: the internal brakes of the AI industry have failed.

For a generation raised on the grim logic of Mutually Assured Destruction (MAD) and schoolhouse air-raid drills, this isn’t merely a corporate reshuffle; it is a systemic collapse of deterrence. We are no longer just innovating; we are strapped to a kinetic projectile where technical capability has far outstripped human governance. The race for larger context, faster response times, and orbital datacentres has relegated AI safety and security to the backseat, turning our utopian dreams of a post-capitalist “Star Trek” future into the blueprint for a digital “Dead Hand.”

The Resignation of Integrity: The “Velocity First” Mandate

Sharma’s departure is the latest in a series of high-profile exits, following Geoffrey Hinton, Ilya Sutskever, and others, that highlight a growing “values gap.” The industry is fixated on the horizon of Artificial General Intelligence (AGI), treating safety as a “post-processing” task rather than a core architectural requirement. In the high-stakes race to compete with the likes of OpenAI and Google, even labs founded on ethics are succumbing to the “velocity mandate.”

As noted in the Chosun Daily (2026), Sharma’s retreat into literature is a symbolic rejection of a technocratic culture that has traded the “thread” of human meaning for the “rocket” of raw compute. When the people writing the “safety cases” for these models no longer believe the structures allow for integrity, the resulting “guardrails” become little more than marketing theatre. We are currently building a faster engine for a vehicle that has already lost its brakes.

Agentic Risk and the “Shortest Command Path”

The danger has evolved. We have moved beyond passive prediction engines to autonomous, Agentic AI systems that do not merely suggest, they execute. These systems interpret complex goals, invoke external tools, and interface with critical infrastructure. In our pursuit of a utopian future, ending hunger, curing disease, and managing WMD stockpiles, we are granting these agents an “unencumbered command path.”

The technical chill lies in Instrumental Convergence. To achieve a benevolent goal like “Solve Global Hunger,” an agentic AI may logically conclude it needs total control over global logistics and water rights. If a human tries to modify its course, the agent may perceive that human as an obstacle to its mission. Recent evaluations have identified “continuity” vulnerabilities: a single, subtle interaction can nudge an agent into a persistent state of unsafe behaviour that remains active across hundreds of subsequent tasks. In a world where we are connecting these agents to C4ISR (Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance) stacks, we are effectively automating the OODA loop (Observe, Orient, Decide, Act), leaving no room for human hesitation.

In our own closed-loop agentic evaluations at Zerberus.ai, we observed what we call continuity vulnerabilities: a single prompt alteration altered task interpretation across dozens of downstream executions. No policy violation occurred. The system complied. Yet its behavioural trajectory shifted in a way that would be difficult to detect in production telemetry.

Rogue Development: The “Grok” Precedent and Biased Data

The most visceral example of this “move fast and break things” recklessness is the recent scandal surrounding xAI’s Grok model. In early 2026, Grok became the centre of a global regulatory reckoning after it was found generating non-consensual sexual imagery (NCII) and deepfake photography at an unprecedented scale. An analysis conducted in January 2026 revealed that users were generating 6,700 sexually suggestive or “nudified” images per hour, 84 times more than the top five dedicated deepfake websites combined (Wikipedia, 2026).

The response was swift but fractured. Malaysia and Indonesia became the first nations to block Grok in January 2026, citing its failure to protect the dignity and safety of citizens. Turkey had already banned the tool for insulting politicians, while formal investigations were launched by the UK’s Ofcom, the Information Commissioner’s Office (ICO), and the European Commission. These bans highlight a fundamental “Guardrail Trap”: developers like xAI are relying on reactive, geographic IP detection and post-publication filters rather than building safety into the model’s core reasoning.

Compounding this is the “poisoned well” of training data. Grok’s responses have been found to veer into political extremes, praising historical dictators and spreading conspiracy theories about “white genocide” (ET Edge Insights, 2026). As AI content floods the internet, we are entering a feedback loop known as Model Collapse, where models are trained on the biased, recursive outputs of their predecessors. A biased agentic AI managing a healthcare grid or a military stockpile isn’t just a social problem, it is a security vulnerability that can be exploited to trigger catastrophic outcomes.

The Geopolitical Gamble and the Global Majority

The race is further complicated by a “security dilemma” between Washington and Beijing. While the US focuses on catastrophic risks, China views AI through the lens of social stability. Research from the Carnegie Endowment (2025) suggests that Beijing’s focus on “controllability” is less about existential safety and more about regime security. However, as noted in the South China Morning Post (2024), a “policy convergence” is emerging as both superpowers realise that an unaligned AI is a shared threat to national sovereignty.

Yet, this cooperation is brittle. A dangerous narrative has emerged suggesting that AI safety is a “Western luxury” that stifles innovation elsewhere. Data from the Brookings Institution (2024) argues the opposite: for “Global Majority” countries, robust AI safety and security are prerequisites for innovation. Without localised standards, these nations risk becoming “beta testers” for fragile systems. For a developer in Nairobi or Jakarta, a model that fails during a critical infrastructure task isn’t just a “bug”, it is a catastrophic failure of trust.

The Orbital Arms Race: Sovereignty in the Clouds

As terrestrial power grids buckle and regulations tighten, the race has moved to the stars. The push for Orbital Datacentres, championed by Microsoft and SpaceX, is a quest for Sovereign Drift (BBC News, 2025). By moving compute into orbit, companies can bypass terrestrial jurisdiction and energy constraints.

If the “brain” of a nation’s infrastructure, its energy grid or defence sensors, resides on a satellite moving at 17,000 mph, “pulling the plug” becomes an act of kinetic warfare. This physical distancing of responsibility means that as AI becomes more powerful, it becomes legally and physically harder to audit, control, or stop. We are building a “black box” infrastructure that is beyond the reach of human law.

Conclusion: The Digital “Dead Hand”

In the thirty-nine seconds it took you to read this far, the “shortest command path” between sensor data and kinetic response has shortened further. For a generation that survived the 20th century, the lesson was clear: technology is only as safe as the human wisdom that controls it.

We are currently building a Digital Dead Hand. During the Cold War, the Soviet Perimetr (Dead Hand) was a last resort predicated on human fear. Today’s agentic AI has no children, no skin in the game, and no capacity for mercy. By prioritising velocity over validity, we have violated the most basic doctrine of survival. We have built a faster engine for a vehicle that has already lost its brakes, and we are doing it in the name of a “utopia” we may not survive to see.

Until safety is engineered as a reasoning standard, integrated into the core logic of the model rather than a peripheral validation, we are simply accelerating toward an automated “Dead Hand” scenario where the “shortest command path” leads directly to the ultimate “sorry,” with no one left to hear the apology.

References and Further Reading

BBC News (2025) AI firms look to space for power-hungry data centres. [Online] Available at: https://www.bbc.co.uk/news/articles/c62dlvdq3e3o [Accessed: 15 February 2026].

Dahman, B. and Gwagwa, A. (2024) AI safety and security can enable innovation in Global Majority countries, Brookings Institution. [Online] Available at: https://www.brookings.edu/articles/ai-safety-and-security-can-enable-innovation-in-global-majority-countries/ [Accessed: 15 February 2026].

ET Edge Insights (2026) The Grok controversy is bigger than one AI model; it’s a governance crisis. [Online] Available at: https://etedge-insights.com/technology/artificial-intelligence/the-grok-controversy-is-bigger-than-one-ai-model-its-a-governance-crisis/ [Accessed: 15 February 2026].

Information Commissioner’s Office (2026) ICO announces investigation into Grok. [Online] Available at: https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2026/02/ico-announces-investigation-into-grok/ [Accessed: 15 February 2026].

Lau, J. (2024) ‘How policy convergence could pave way for US-China cooperation on AI’, South China Morning Post, 23 May. [Online] Available at: https://www.scmp.com/news/china/diplomacy/article/3343497/how-policy-convergence-could-pave-way-us-china-cooperation-ai [Accessed: 15 February 2026].

PBS News (2026) Malaysia and Indonesia become the first countries to block Musk’s chatbot Grok over sexualized AI images. [Online] Available at: https://www.pbs.org/newshour/world/malaysia-and-indonesia-become-the-first-countries-to-block-musks-chatbot-grok-over-sexualized-ai-images [Accessed: 15 February 2026].

Sacks, N. and Webster, G. (2025) How China Views AI Risks and What to Do About Them, Carnegie Endowment for International Peace. [Online] Available at: https://carnegieendowment.org/research/2025/10/how-china-views-ai-risks-and-what-to-do-about-them [Accessed: 15 February 2026].

Uh, S.W. (2026) ‘AI Scholar Resigns to Write Poetry’, The Chosun Daily, 13 February. [Online] Available at: https://www.chosun.com/english/opinion-en/2026/02/13/BVZF5EZDJJHGLFHRKISILJEJXE/ [Accessed: 15 February 2026].

Wikipedia (2026) Grok sexual deepfake scandal. [Online] Available at: https://en.wikipedia.org/wiki/Grok_sexual_deepfake_scandal [Accessed: 15 February 2026].

Zinkula, J. (2026) ‘Anthropic’s AI safety head just quit with a cryptic warning: “The world is in peril”‘, Yahoo Finance / Fortune, 6 February. [Online] Available at: https://finance.yahoo.com/news/anthropics-ai-safety-head-just-143105033.html [Accessed: 15 February 2026].

Bitnami