Text and transcription are checked with a single teacher-forced pass. For the diffusion modalities (image, music, speech, video) a provider commits its whole generation trajectory, a Merkle root over sampled denoising steps plus the final latent, and a verifier re-runs one step to catch cheating, at roughly 1/N the cost.
Chat, reasoning and code from open transformer models, Llama and Qwen class. Verified token by token.
Text to image in latent space. Stable Diffusion and FLUX class models.
Full songs, vocals and instrumentation, from a prompt and lyrics. ACE-Step.
Natural, long-form, multi-speaker voice. VibeVoice.
Transcribe and translate across roughly 99 languages. Whisper.
Short clips from text or images via diffusion transformers. Wan, LTX-Video, Mochi, HunyuanVideo.