Groq Benchmark Results

Groq is an inference provider known for high-throughput, low-latency serving using custom LPU hardware. Benchscope records public evaluation runs across 1 model family hosted on Groq, covering MUSR, MATH, GSM8K, IFEVAL.

About Groq Endpoints

Groq's LPU architecture prioritizes inference speed. Benchmark scores from Groq endpoints reflect their specific serving configuration and hardware — not model capability in isolation. For the same model family, scores from Groq can differ from the same model hosted elsewhere due to quantization, infrastructure, or serving optimizations. Use canonical-prompt runs for the cleanest cross-provider comparisons.

Hosted Model Families

Model families with public evaluation runs on Groq: Llama 3.3 70B.

Recent Groq Runs

Groq / Llama 3.3 70B on MUSR: completed; 35.0%; 346 ms p50 latency; 20 samples.
Groq / Llama 3.3 70B on MATH: completed; 30.4%; 1967 ms p50 latency; 23 samples.
Groq / Llama 3.3 70B on MUSR: completed; 50.0%; 311 ms p50 latency; 10 samples.
Groq / Llama 3.3 70B on MUSR: completed; 100.0%; 617 ms p50 latency; 1 samples.
Groq / Llama 3.3 70B on MUSR: partial; 0.0%; 438 ms p50 latency; 20 samples.

Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.

Groq Benchmark Results

About Groq Endpoints

Hosted Model Families

Recent Groq Runs

Related