How we decide a finding is real before we tell you about it
A few weeks ago we had a finding we liked. Our captures showed an attack wedging a validator: slot rate down to 0.08 per second, CPU pinned. A clean denial-of-service against consensus, the kind of result that makes a good advisory.
We pulled it. Nobody challenged it. We re-verified against a fresh cluster and it evaporated.
The collapse to 0.08 slots per second wasn't the attack. It was the baseline. Our lab cluster runs at roughly 12.8 seconds per slot by genesis design, about 0.078 untouched, and we'd been comparing it against an assumed mainnet pace of 2.5 that belonged to a different system entirely. Against the cluster's own configured baseline, the slot rate never moved. We re-ran all seven variants up to 40,000 packets per second. No wedge. CPU at 108 to 126%, not the 150 to 218% we'd written down. We stripped fourteen outcome claims and kept only the wire-shape label, because the packets were real even though the impact wasn't.
We're telling you this because it's the most important thing about how we work. A researcher who publishes findings that don't reproduce is worse than one who publishes nothing: every engineer who acts on a phantom pays real cost, and learns to ignore the next advisory. The willingness to retract is the credibility. If we never show you one, you should wonder which others we should have pulled and didn't.
This isn't conscientiousness, it's process, and the process is the part worth hiring. Before we train a detector or run a sweep, we pre-register the corpus, the features, and the outcome thresholds in code, so the headline number lands against a target locked before we saw it. We audit against sanity floors and a falsification holdout whose entire job is to prove the detector is cheating, learning a capture artefact instead of an attack. If the audit fires, we stop and re-register the next version rather than quietly tuning until it passes. The trail of those transitions is published.
The reason this matters to anyone deciding whether to work with us: when we hand you a finding, the question of whether it's real has already been adversarially tested by us, against us. The baseline came from the system's own config. The detector survived a holdout built to break it. The claim that's left is the claim we couldn't make go away.
That's the standard. The retractions aren't the embarrassing part of the record. They're the part that lets you trust the rest of it.
Simon Morley researches validator infrastructure security and is the founder of NullRabbit. About / contact.
Related Posts
Anyone can knock a validator over once. The skill is designing an attack you can learn from
Making a node fall over is easy and proves nothing. The craft is building a reproducer that isolates the mechanism, measures it against an honest baseline, bounds the cost, and runs on one command, so the number actually means something.
How we're building cross-chain ML detection for blockchain validator infrastructure
How we built a wire-shape detector that transfers across chains. V8 trained only on Sui hit 51 out of 51 zero-shot on Solana attacks it had never seen, because mechanism-class features carry across chains while host-telemetry features don't.
Why ML Detection on Validator Infrastructure Keeps Reporting ROC = 1.000
V1 of our trainer scored ROC = 1.000 across all 17 folds. Two minutes of audit found why. Eight leak surfaces later, here's the apparatus that stops you fooling yourself with one.
