On-device inference.

Fast, small, local models. Quantization, scheduling and backends that put frontier capability on the hardware people already own.

Frontier capability should not be contingent on datacenter access. OGONG Labs originated in the development of a high-performance inference engine, and a substantial portion of our work concerns making capable models execute efficiently on commodity hardware.

Small, fast, local

We work on quantization that holds quality at low bit-widths, schedulers and KV-cache strategies that keep latency low, and backends that treat commodity GPUs and Apple Silicon as first-class targets rather than afterthoughts. The same engine that serves a node on the network runs a zero-signup model server on a laptop.

Why it matters for verification

Cheap verification is what lets weak hardware participate. On Apple Silicon the verify-to-generate ratio is about 5%, so even a modest machine can audit its peers. On-device performance and verifiable inference reinforce each other: the cheaper inference gets, the wider the honest network can spread.