Qwen 3 235B A22B Instruct benchmark results

Compare Qwen 3 235B A22B Instruct benchmark results across hosted providers and endpoints. This page summarizes public runs on MMLU, MATH, GSM8K, IFEval, and MuSR, including score, latency, sample coverage, prompts, outputs, and methodology.

Provider Endpoints

Qwen 3 235B A22B Instruct is tracked on Benchscope. Public evaluation runs will appear here as providers submit results. Provider-hosted versions of the same model can differ in quantization, infrastructure, and serving configuration.

How to Compare Endpoints

Use canonical-prompt runs on the same benchmark to compare endpoints fairly. Score differences between providers running the same model family reflect hosting differences rather than model differences. Check the methodology for how runs are defined and what makes them comparable.

MMLU results for Qwen 3 235B A22B Instruct

No public MMLU runs for Qwen 3 235B A22B Instruct are available yet. Explore all MMLU benchmark results or run this benchmark on your endpoint.

MATH results for Qwen 3 235B A22B Instruct

No public MATH runs for Qwen 3 235B A22B Instruct are available yet. Explore all MATH benchmark results or run this benchmark on your endpoint.

GSM8K results for Qwen 3 235B A22B Instruct

No public GSM8K runs for Qwen 3 235B A22B Instruct are available yet. Explore all GSM8K benchmark results or run this benchmark on your endpoint.

IFEval results for Qwen 3 235B A22B Instruct

No public IFEval runs for Qwen 3 235B A22B Instruct are available yet. Explore all IFEval benchmark results or run this benchmark on your endpoint.

MuSR results for Qwen 3 235B A22B Instruct

No public MuSR runs for Qwen 3 235B A22B Instruct are available yet. Explore all MuSR benchmark results or run this benchmark on your endpoint.

Benchscope is a JavaScript app. If the interactive interface does not load, enable JavaScript or use the links above for the main public sections.