Groq Benchmark Results
Groq is an inference provider known for high-throughput, low-latency serving using custom LPU hardware. Benchscope records public evaluation runs across 1 model family hosted on Groq, covering MUSR, MATH, GSM8K, IFEVAL.
About Groq Endpoints
Groq's LPU architecture prioritizes inference speed. Benchmark scores from Groq endpoints reflect their specific serving configuration and hardware — not model capability in isolation. For the same model family, scores from Groq can differ from the same model hosted elsewhere due to quantization, infrastructure, or serving optimizations. Use canonical-prompt runs for the cleanest cross-provider comparisons.
Hosted Model Families
Model families with public evaluation runs on Groq: Llama 3.3 70B.
Recent Groq Runs
- Groq / Llama 3.3 70B on MUSR: completed; 35.0%; 346 ms p50 latency; 20 samples.
- Groq / Llama 3.3 70B on MATH: completed; 30.4%; 1967 ms p50 latency; 23 samples.
- Groq / Llama 3.3 70B on MUSR: completed; 50.0%; 311 ms p50 latency; 10 samples.
- Groq / Llama 3.3 70B on MUSR: completed; 100.0%; 617 ms p50 latency; 1 samples.
- Groq / Llama 3.3 70B on MUSR: partial; 0.0%; 438 ms p50 latency; 20 samples.
Related
- MMLU benchmark results across all providers
- MATH benchmark results across all providers
- GSM8K benchmark results across all providers
- All model families on Benchscope
- Best LLM endpoint for MMLU
- Best LLM endpoint for MATH
- Best LLM endpoint for GSM8K
- Llama 3.3 70B on Groq vs Together AI
- How benchmark results are defined and compared
Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.