Why Autonomous Enforcement Must Earn Authority

The gap is not capability. It is legitimacy.

When a machine acts autonomously at microsecond timescales to defend a network against a threat no human has yet classified - who authorised it? Under what constraints? To what standard?

The current answer is silence.

The components exist and are proven - kernel-level packet processing, behavioural machine learning, automated enforcement and more. Some are deployed at nation-state scale. Large technology companies have built autonomous defence for their own infrastructure. The research is published. The technology is available.

The governance is not.

No standard defines when autonomous defensive action is appropriate beyond predefined categories. No vocabulary describes the boundaries of machine authority. No doctrine governs how human oversight functions when the human cannot be in the loop. Regulation assumes a human made a decision. Compliance frameworks assume a human approved an action. Insurance models assume a human was either diligent or negligent.

None of these assumptions survive contact with a system that blocks a zero-day exploit at kernel speed before any human knows the vulnerability exists.

The structural constraint

The asymmetry between offensive and defensive automation is well understood by anyone operating in this space. Attacks move at machine speed. Defensive automation also moves at machine speed, but only within pre-authorised boundaries: known malware families, documented attack patterns, signature-matched abuse.

For everything outside those boundaries - zero-days, behavioural anomalies, abuse classes that fall outside existing playbooks - defensive capability is gated by human cognition. Alerts triaged, incidents classified, responses approved. Even in mature security operations centres, this takes minutes to hours.

This is not an indictment of security teams. Our (the humans) cognitive throughput has physical limits. Reading an alert takes seconds; understanding context takes minutes; approving a novel response takes longer. These are not inefficiencies to optimise, they are properties of human decision-making.

The attack completes before the approval chain does. Not because the SOC failed, because the problem is structural.

Three regimes, all incomplete

Authority for defensive automation currently operates under three models:

Advisory automation. The system detects and alerts, humans decide. This works until the attack is faster than the analyst (or the humans are asleep).

Pre-authorised playbooks. Humans approve response categories in advance, the system executes within those boundaries. This works until the threat doesn't match a category.

Vendor-asserted trust. Operators permit the vendor's model to act based on the vendor's testing, not evidence from their own environment. This works until it doesn't - and when it fails, the operator bears the consequences.

All three handle known threats. None addresses the question that actually matters: how should authority be granted for autonomous action against threats the system has never encountered and the operator has never approved?

The choice is not between autonomous defence and no autonomous defence. Autonomous defence is already here, operating within those three regimes. The choice is between autonomous defence with legitimate governance and autonomous defence without it.

Earned autonomy

Authority is not granted by default. It is not assumed by capability. It is not declared by vendors or asserted by technology. It must be earned.

Before a system is permitted to act autonomously, it must demonstrate - on real traffic, under real conditions, against real threats - that its judgment can be trusted. Not in a lab. Not on synthetic data. Not against last year's attack patterns. On the actual network it will defend, during the actual period it will operate.

This demonstration is not a one-time certification, it's continuous. The system must keep earning the authority it has been granted, or that authority is revoked.

This is earned autonomy - a governance framework for delegating defensive authority to machines. Like identity and access management, authority must be demonstrated before granted, scoped, and continuously validated. But applied to machine judgment rather than user access.

The framework has seven requirements:

Bounded scope. Authority is granted per abuse class, not globally. SYN floods, credential stuffing, DNS amplification - each separately scoped, separately evaluated, separately authorised. Authority over one class does not imply authority over another.

Rehearsal on reality. Before enforcement is permitted, the system operates in shadow mode on live traffic. It makes judgments but does not act. It records what it would have done. Shadow mode is not a feature. It is a prerequisite.

Counterfactual record. Every decision is logged with full context: what triggered it, what action would have been taken, what the outcome would have been. A complete, auditable record of machine judgment before any machine action.

Human review. Humans examine the counterfactual record - not every decision, but enough decisions with enough diversity to establish confidence. The question is simple: if this system had been acting, would its actions have been correct?

Explicit thresholds. Authority requires meeting a defined standard - false positive rate, accuracy metric, confidence interval. If the threshold is not met, enforcement does not happen. No override, no exception.

Continuous validation. Authority is not permanent. Rehearsal continues after enforcement begins. If performance degrades, authority is suspended automatically. The system must continuously re-earn what it has been granted.

Reversibility and audit. Every action is logged and explained. If the system was wrong, there is a path to correction. Autonomous authority requires autonomous accountability.

This is trust by evidence, not trust by assertion. The system does not ask to be trusted. It shows its work. Authority follows from demonstrated competence, not claimed capability.

Reference implementation

Earned autonomy requires a specific architectural choice: the separation of judgment from execution.

At NullRabbit, this separation is the foundation of everything we build. One system - IBSR - observes traffic at kernel level, learns what normal looks like for a specific network, and produces judgment about what deviates. It identifies what doesn't belong, records what it would recommend blocking, and produces a readiness assessment for each abuse class.

A separate system - Guard - handles enforcement, operating at kernel level using XDP/eBPF. Guard does not decide. It executes. The judgment about what is malicious comes from IBSR. Guard receives instructions, not data.

This separation is what makes earned autonomy operational rather than theoretical. IBSR can run indefinitely without Guard - learning, observing, producing judgments, building evidence - all without consequence. During shadow deployment against production traffic, the counterfactual record accumulates. Humans review it. Confidence builds or it doesn't. Authority is earned or it isn't. Only when competence has been demonstrated does Guard receive permission to act.

But earning authority once is insufficient. Threat landscapes shift. Models drift. Adversaries adapt. We subject enforcement rules to continuous adversarial validation - evolutionary testing that probes detection boundaries, measures resilience against evasion, and determines whether authority should be maintained or revoked. Every rule has a lifecycle. Every lifecycle is governed. Every judgment is scored.

The goal is not a product that defends networks. It is a platform that makes autonomous enforcement governable - across deployments, across abuse classes, across the evolving threat surface.

The failure modes are real

If we are honest about what we are proposing - delegating network-lethal authority to machines - we must be equally honest about how it can fail. We have mapped this territory because we operate in it.

Adversarial drift. Attackers who understand the detection model can craft traffic that stays within learned boundaries - slow, patient intrusion that never triggers anomaly detection. Continuous validation catches the evidence of successful evasion, but the adaptation race does not end. This is precisely where human judgment retains its advantage: reviewing baseline evolution and asking why has normal changed? in ways that continuously adapting models cannot.

Baseline poisoning. An attacker present during the learning phase becomes part of the baseline. Their traffic is learned as normal. The system never flags it. This is not evasion - it is invisibility by definition. Mitigations exist but are imperfect: integrity monitoring of the learning process, immutable baseline snapshots, adversarial self-testing, air-gapped reference models trained on known-clean traffic.

Scope creep. Authority granted for bounded abuse classes faces pressure to expand. Each extension seems reasonable. Each moves further from the evidence that justified the original authority. Extension requires new evidence, not extrapolation.

Cascading failure. If the system mislearns normal, enforcement acts on corrupted judgment at kernel speed across the network. Autonomous defence at machine speed means autonomous failure at machine speed. The separation of judgment from execution is a structural mitigation - IBSR can be wrong without Guard acting - but once authority is granted, the speed that makes autonomous defence effective is the same speed that makes autonomous failure dangerous.

These risks are inherent to any system that learns from observed behaviour and acts on that learning - including current ML-based detection, behavioural analytics platforms, and adaptive security tools. The difference is that earned autonomy makes them explicit, names them, and subjects them to continuous validation rather than hoping they don't materialise.

The uncomfortable inversion

Here is the part that challenges every governance framework, every compliance regime, every security doctrine that emphasises keeping the human in the loop.

If the system demonstrates, repeatedly, that it correctly identifies malicious traffic - and that traffic is allowed to pass because no human approved the block in time - then the human in the loop is not providing oversight. They are providing delay.

If attacks succeed because the approval chain took longer than the attack, then the governance framework is not managing risk. It is guaranteeing harm.

If the counterfactual record shows that autonomous action would have prevented damage that human-gated response failed to prevent, then the decision to keep humans in the loop is not conservative. It is negligent.

The burden of proof inverts. The question is no longer whether you can prove the machine should be trusted. The question becomes whether you can justify continuing to prevent it from acting.

This does not mean autonomous action is always correct, or that humans should be removed from all decisions. It means there exists a class of threats, in certain environments, where the evidence will show that autonomous action outperforms human-gated response. For that class, in those environments, earned autonomy is not optional. It is the only responsible position.

The industry needs governance frameworks that make this delegation legitimate - not as one company's approach, but as an operational standard. Earned autonomy is our contribution to that standard. The work of making it an industry norm is larger than any single organisation, and it has to start somewhere.

Simon Morley is the founder of NullRabbit, building earned autonomy for autonomous network defence. The full research paper, "On Earned Autonomy: Delegating Network-Lethal Authority to Machines," is available at DOI: 10.5281/zenodo.18406828. It's also available on SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6067848.

Thanks for reading.

Why Autonomous Enforcement Must Earn Authority

The structural constraint

Three regimes, all incomplete

Earned autonomy

Reference implementation

The failure modes are real

The uncomfortable inversion

Related Posts

Earned Autonomy: The Paper

Building the Jig (Again): Claiming the Time Dimension

Validating Inline Enforcement with XDP: IBSR and the Path to Earned Autonomy