Qwen3 32B benchmark results

Compare Qwen3 32B benchmark results across hosted providers and endpoints. This page summarizes public runs on MMLU, MATH, GSM8K, IFEval, and MuSR, including score, latency, sample coverage, prompts, outputs, and methodology.

Provider Endpoints

Qwen3 32B has 4 public runs across 1 provider. Provider-hosted versions of the same model can differ in quantization, infrastructure, and serving configuration, which affects benchmark results independently of model capability.

How to Compare Endpoints

Use canonical-prompt runs on the same benchmark to compare endpoints fairly. Score differences between providers running the same model family reflect hosting differences rather than model differences. Check the methodology for how runs are defined and what makes them comparable.

MMLU results for Qwen3 32B

Provider	Endpoint	Best Score	Runs
Groq	Groq / Qwen3 32B	70.0%	1 run

Explore all MMLU benchmark results →

MATH results for Qwen3 32B

No public MATH runs for Qwen3 32B are available yet. Explore all MATH benchmark results or run this benchmark on your endpoint.

GSM8K results for Qwen3 32B

Provider	Endpoint	Best Score	Runs
Groq	Groq / Qwen3 32B	89.0%	1 run

Explore all GSM8K benchmark results →

IFEval results for Qwen3 32B

No public IFEval runs for Qwen3 32B are available yet. Explore all IFEval benchmark results or run this benchmark on your endpoint.

MuSR results for Qwen3 32B

Provider	Endpoint	Best Score	Runs
Groq	Groq / Qwen3 32B	30.0%	1 run

Explore all MuSR benchmark results →

Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.