Models

Every modality, checked.

Text and transcription are checked with a single teacher-forced pass. For the diffusion modalities (image, music, speech, video) a provider commits its whole generation trajectory, a Merkle root over sampled denoising steps plus the final latent, and a verifier re-runs one step to catch cheating, at roughly 1/N the cost.

Text

LLM inference

Chat, reasoning and code from open transformer models, Llama and Qwen class. Verified token by token.

Image

Diffusion

Text to image in latent space. Stable Diffusion and FLUX class models.

Music

Song generation

Full songs, vocals and instrumentation, from a prompt and lyrics. ACE-Step.

Speech

Text to speech

Natural, long-form, multi-speaker voice. VibeVoice.

Transcription

Speech to text

Transcribe and translate across roughly 99 languages. Whisper.

Video

Video generation

Short clips from text or images via diffusion transformers. Wan, LTX-Video, Mochi, HunyuanVideo.