Verdict System

Every verdict in rustbox is a pure function over kernel evidence. We never guess.

The problem with most judges

Most online judges classify verdicts like this:

if exit_code != 0:     return "Runtime Error"
if wall_time > limit:  return "Time Limit"
if memory > limit:     return "Memory Limit"
return "Accepted"

This is wrong in subtle ways. What if the process was OOM-killed with exit code 137? Is that RE or MLE? What if the process hit a CPU time limit but the wall clock hasn’t expired? What if the judge killed the process for timeout but the process also crashed independently?

Evidence bundles

rustbox collects evidence from multiple kernel sources before making a verdict:

Wait outcome - exit code, terminating signal, stop/continue status
Judge actions - what the judge did (timer kills, escalations)
Cgroup evidence - memory peak, OOM events, CPU usage
Timing evidence - wall time, CPU time, divergence classification
Process lifecycle - descendant containment, zombie count
Collection errors - what we failed to collect

The verdict classifier takes this bundle and applies a decision tree:

Cleanup failure? → Internal Error (evidence integrity compromised)
Evidence collection errors? → Internal Error
Judge killed the process? → Check why (timeout → TLE, OOM → MLE)
OOM events in cgroup? → Memory Limit Exceeded
Exit code 0? → Accepted
Non-zero exit? → Runtime Error
Signal without attribution? → Signaled

Divergence detection

CPU time and wall time don’t always agree. The ratio tells you something:

CPU/Wall ratio	Classification	What it means
> 0.8	CPU-bound	Process was computing the whole time
< 0.3	Sleep/block-bound	Process was waiting (sleep, I/O)
Intermediate	Host interference suspected	System load may have affected results

This matters for competitive programming: a TLE where the process was CPU-bound is a genuine algorithm problem. A TLE where the process was sleeping might be a broken test case.

Verdict provenance

Every verdict includes an audit trail:

{
  "verdict_actor": "kernel",
  "verdict_cause": "mle_kernel_oom",
  "verdict_evidence_sources": ["wait_outcome", "cgroup_evidence", "oom_events"],
  "memory_peak": 134217728
}

If a contestant disputes a verdict, you can show exactly which kernel evidence led to the classification.