NullRabbit Logo

Earned Autonomy

A framework for deciding when machines should be allowed to act autonomously in defence. Authority must be demonstrated before it is exercised.

By NullRabbit Labs

Earned Autonomy

Autonomous defence already exists. Security systems classify traffic, score risk, trigger alerts, and sometimes block activity without human intervention. For known threats, this works. For novel threats, it breaks down.

The unsolved problem is not capability. It is legitimacy.

When a machine takes an irreversible action-blocking traffic, isolating a system, disrupting a service-the question is not "can it do this fast enough?" The question is "should it be allowed to do this at all?"

Earned autonomy is a framework for answering that question.

The speed problem

Modern attacks operate at machine speed. Human approval chains do not.

Consider a zero-day exploit: initial access happens in milliseconds, lateral movement in seconds, impact in minutes. Now consider the defensive timeline for a novel threat-one that doesn't match existing signatures or playbooks. Alerts fire in seconds to minutes, triage takes minutes to hours, approval takes longer still. The attack completes before the approval chain does.

This is not an indictment of security operations. It is a structural constraint. Human cognition has physical limits. Reading an alert takes seconds. Understanding context takes minutes. Approving a novel response takes longer. These are not inefficiencies to be optimised away. They are inherent to how humans make decisions.

How defence works today

Current security automation operates under one of three regimes:

Advisory automation. The system detects and alerts. Humans decide. Response latency is bounded by analyst availability.

Pre-authorised playbooks. Humans approve response categories in advance. Machines execute within those boundaries. This works for known threat classes under static policy.

Vendor-asserted trust. Operators accept that a vendor's model is accurate and permit it to act. The evidence comes from the vendor's testing, not the operator's environment.

All three work for threats that match existing categories. None addresses the core problem: how should authority be granted for autonomous action against threats that don't yet have signatures, playbooks, or vendor classifications?

This creates an authority gap. Machines could act, but have no defensible basis for doing so.

What earned autonomy means

Earned autonomy is the idea that autonomous authority must be demonstrated before it is exercised.

Authority is not granted by default. It is not assumed by capability. It is not declared by vendors. It must be earned through evidence generated in the operator's actual environment, on real traffic, under real conditions.

The framework has several components:

Bounded scope. Authority is granted per abuse class, not for "the network." SYN floods, credential stuffing, DNS amplification-each separately scoped, separately evaluated, separately authorised.

Rehearsal on reality. Before enforcement is permitted, the system operates in shadow mode on live traffic. It makes judgments but does not act. It records what it would have done.

Counterfactual record. Every decision is logged with full context: what triggered it, what action would have been taken, what the outcome would have been. A complete, auditable record of machine judgment.

Human review. Operators examine the counterfactual record. The question is simple: if this system had been acting, would its actions have been correct?

Explicit thresholds. Authority requires meeting a defined standard-false positive rate, accuracy metric, confidence interval. If the threshold is not met, enforcement does not happen.

Continuous validation. Authority is not permanent. Rehearsal continues after enforcement begins. If performance degrades, authority is suspended automatically.

Reversibility and audit. Every action is logged and explained. If the system was wrong, there is a path to correction.

The counterfactual record

The counterfactual record is central to earned autonomy. It answers concrete questions: would legitimate traffic have been blocked? How often would action have occurred? Under what thresholds?

This replaces vendor claims with evidence generated in the operator's own environment. Instead of "trust our model," earned autonomy says "here is what our model would have done on your traffic-judge for yourself."

The uncomfortable inversion

We are taught to fear autonomous action. Every governance framework emphasises human oversight: keep the human in the loop, require approval, maintain control.

This caution is not wrong. Autonomous systems can fail catastrophically. Vendors overpromise. Complexity hides risk.

But there is a threshold beyond which caution itself becomes the risk.

If a system demonstrates, repeatedly, that it correctly identifies malicious traffic-and that traffic is allowed to pass because no human approved the block in time-then the human in the loop is not providing oversight. They are providing delay.

If attacks succeed because the approval chain took longer than the attack, then the governance framework is not managing risk. It is guaranteeing harm.

The counterfactual record makes this trade-off explicit. If the system would have been wrong, that is evidence against granting authority. But if the system would have been right-consistently, measurably-then withholding authority has a cost. That cost is the attacks that succeeded while waiting for approval.

What earned autonomy is not

It is not a claim of perfect detection. It is not a promise of zero false positives. It is not a replacement for human judgment.

Earned autonomy does not remove humans from the loop. It relocates them-from the millisecond decision that humans cannot make, to the strategic review that humans are still better at. Setting boundaries, defining thresholds, auditing outcomes, adjusting the framework when it fails.

Why this matters for decentralised infrastructure

In decentralised systems, enforcement failures have network-level consequences. A validator that goes offline affects consensus. A compromised node can propagate bad state. The blast radius extends beyond the individual operator.

This raises the stakes for both action and inaction. Reckless automation can cascade failures across a network. But paralysis in the face of attack can be equally damaging.

Earned autonomy provides a path between these extremes. It does not eliminate risk-it makes risk explicit and subjects autonomous action to continuous validation.

Related concepts

Earned autonomy is a governance framework. It sits alongside other concepts in the security architecture:

Earned autonomy governs when execution is allowed. It is not a feature. It is a constraint-one that any system claiming to act autonomously must satisfy.

For questions about earned autonomy methodology, contact the research team.

Related Research