Cerebras Benchmark Results
Cerebras is an inference provider using Wafer Scale Engine hardware optimized for large model throughput. Benchscope records public evaluation runs across 1 model family hosted on Cerebras, covering MATH, MUSR, IFEVAL, GSM8K.
About Cerebras Endpoints
Cerebras hardware is built for high-throughput inference at scale. Benchmark scores from Cerebras endpoints reflect their specific serving configuration and may differ from the same model family hosted elsewhere. Use canonical-prompt runs for the cleanest cross-provider comparisons.
Hosted Model Families
Model families with public evaluation runs on Cerebras: Qwen 3 235B A22B Instruct.
Recent Cerebras Runs
- Cerebras / Qwen 3 235B A22B Instruct on MATH: completed; 60.0%; 2118 ms p50 latency; 20 samples.
- Cerebras / Qwen 3 235B A22B Instruct on MATH: partial; 57.9%; 1714 ms p50 latency; 20 samples.
- Cerebras / Qwen 3 235B A22B Instruct on MUSR: completed; 55.0%; 232 ms p50 latency; 20 samples.
- Cerebras / Qwen 3 235B A22B Instruct on BBH: partial; 39.8%; 1189 ms p50 latency; 100 samples.
- Cerebras / Qwen 3 235B A22B Instruct on BBH: partial; 41.4%; 972 ms p50 latency; 100 samples.
Related
- MMLU benchmark results across all providers
- MATH benchmark results across all providers
- GSM8K benchmark results across all providers
- All model families on Benchscope
- Best LLM endpoint for MMLU
- Best LLM endpoint for MATH
- Best LLM endpoint for GSM8K
- Llama 3.3 70B on Groq vs Together AI
- How benchmark results are defined and compared
Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.