NullRabbit
cohort.v1
← Back to Research
Research · May 27, 2026

How we decide a finding is real before we tell you about it

Simon Morley·3 min read

A few weeks ago we had a finding we liked. Our captures showed an attack wedging a validator: slot rate down to 0.08 per second, CPU pinned. A clean denial-of-service against consensus, the kind of result that makes a good advisory.

We pulled it. Nobody challenged it. We re-verified against a fresh cluster and it evaporated.

The collapse to 0.08 slots per second wasn't the attack. It was the baseline. Our lab cluster runs at roughly 12.8 seconds per slot by genesis design, about 0.078 untouched, and we'd been comparing it against an assumed mainnet pace of 2.5 that belonged to a different system entirely. Against the cluster's own configured baseline, the slot rate never moved. We re-ran all seven variants up to 40,000 packets per second. No wedge. CPU at 108 to 126%, not the 150 to 218% we'd written down. We stripped fourteen outcome claims and kept only the wire-shape label, because the packets were real even though the impact wasn't.

We're telling you this because it's the most important thing about how we work. A researcher who publishes findings that don't reproduce is worse than one who publishes nothing: every engineer who acts on a phantom pays real cost, and learns to ignore the next advisory. The willingness to retract is the credibility. If we never show you one, you should wonder which others we should have pulled and didn't.

This isn't conscientiousness, it's process, and the process is the part worth hiring. Before we train a detector or run a sweep, we pre-register the corpus, the features, and the outcome thresholds in code, so the headline number lands against a target locked before we saw it. We audit against sanity floors and a falsification holdout whose entire job is to prove the detector is cheating, learning a capture artefact instead of an attack. If the audit fires, we stop and re-register the next version rather than quietly tuning until it passes. The trail of those transitions is published.

The reason this matters to anyone deciding whether to work with us: when we hand you a finding, the question of whether it's real has already been adversarially tested by us, against us. The baseline came from the system's own config. The detector survived a holdout built to break it. The claim that's left is the claim we couldn't make go away.

That's the standard. The retractions aren't the embarrassing part of the record. They're the part that lets you trust the rest of it.

Simon Morley researches validator infrastructure security and is the founder of NullRabbit. About / contact.

security-researchmethodologydisclosurefalsifiabilitydetection

Related Posts