Cheap verifiable inference for diffusion and codec models.
A provider paid to run a 3.5-billion-parameter diffusion model has an obvious move: run a smaller one, or fewer denoising steps, and keep the difference. The consumer receives a plausible waveform or image and cannot tell. This is the incentive that motivates verifiable inference for language models, but the language-model defenses do not transfer, for a structural reason: they commit a per-token output distribution and spot-check it, and a diffusion model produces no such distribution. It integrates an ODE/SDE from noise over N steps, and the only thing the consumer sees is the last latent. We give a verification primitive for this setting and measure it. The provider commits a trajectory: a Merkle root over (step index, latent digest) at sampled steps plus the final latent. A verifier re-runs one reference denoising step from a committed latent and checks that its prediction matches the next committed latent within a tolerance, at cost rho of about 1/N of the generation. On an honest re-run the step reproduces exactly (relative-L2 = 0); a substituted computation diverges far outside any reasonable tolerance. We implemented this on three independent engines and modalities, a 3.5B diffusion-transformer audio (flow) model, a 1.5B Euler latent-diffusion image model, and a latent video-diffusion model, and the primitive holds on all three. We are precise about the one thing it does not give for free: the accept tolerance is a measured quantity set from the honest cross-hardware reproduction tail, not a proven constant, and we say exactly where that leaves the guarantee.