Symptom · search entry

Solana validator slot distance increasing and vote credits dropping: what it means

A growing slot distance means your validator is processing blocks more slowly than the cluster is producing them, and dropping vote credits are the financial trace of the same lag. Under timely vote credits, a vote earns the most when it lands within a couple of slots of the block it votes on, so a validator that replays late votes late and bleeds credits even while it remains technically caught up. The cause is almost always local resource saturation, not the cluster.

Common causes

Replay is CPU-bound or stalling on accounts database reads, so each block takes longer to process than the cluster's roughly 400 millisecond slot time.
NVMe IOPS are saturated by accounts and ledger writes, often because the OS, snapshots and the ledger share a disk.
The node is losing shreds at ingest and spending its time on repair traffic instead of forward progress.
The host is oversubscribed: CPU steal on a shared instance, throttled cores, or memory pressure pushing the accounts index out of cache.

System-level mechanism

Voting is a pipeline: the node must receive a block's shreds, reassemble it, replay it, and land a vote transaction, and every stage competes for the same host resources. Slot distance is the integral of that pipeline running slower than block production; vote credit decay is the same signal weighted by how late each vote landed. The two diverge in one useful way: credits can drop while slot distance holds steady, which isolates the problem to vote landing (the network path to leaders) rather than replay. Sustained external load makes both worse at once, because ingest pressure steals exactly the CPU and bandwidth the pipeline needs.

What this indicates

A rising slot distance with falling credits indicates the host is saturated somewhere: measure disk latency, CPU steal and NIC drops before suspecting the network. Falling credits at a stable slot distance indicate a vote transmission problem. Either way it is a host and ingress question first, and the answer is measurable rather than guessable.

Related issues

Validator stuck catching up after a restart; packet loss and AF_XDP drop counters rising under load; cluster-wide confirmation slowdowns that look similar from inside one node.

Deep references

How Solana shrugged off a 6 Tbps DDoS covers what sustained ingress load does to validator hosts even when the cluster survives it.
XDP inline defense for validators covers the ingest layer where load shedding has to happen if replay is to keep its resources.
slashr.dev tracks delinquency and missed votes across networks in real time, which is this symptom seen from the outside.

Related symptoms

Evidence

slashr.dev · live validator incident feed