Polymyth Model Capability Audit

Narrative knowledge tests across 80 models — sorted by composite score

Score vs Output Cost (Efficient Frontier)

Model Open Maker Released Total Params Active Params Disk (Q4) Cost (in/out $/M) Speed (tok/s) Campbell /12 Booker /7 Jung Tobias /20 Score Notes Test Conditions
Closed XXL (1 model)
Aion 2.0 Closed Aion Labs 2026-02 ? ? $0.8/1.6 12 7 13 20 59 DeepSeek V3.2 fine-tune 4K tok
1T+ (4 models)
MiMo V2 Pro Open Xiaomi 2026-03 1T+ 42B 2000 $1.0/3.0 12 7 13 20 59 Categorised plots per book structure BF16 size (no Q4 GGUF) 4K tok
Kimi K2 Open Moonshot 2025-07 1T 32B 2000 Free 12 7 12 20 58 BF16 size (no Q4 GGUF) 2K tok
Kimi K2.5 Open Moonshot 2025-12 1T 32B 2000 $0.45/2.2 12 7 12 20 58 BF16 size (no Q4 GGUF) 4K tok
DeepSeek V3.2 Open DeepSeek 2025-12 1.16T 37B 2320 $0.26/0.38 12 7 13 18 57 BF16 size (no Q4 GGUF) 4K tok
Closed XL (4 models)
Gemini 3 Flash Preview Closed Google 2026-03 ? ? $0.5/3.0 15 7 8 20 57 Used Campbell's 17-stage, not Vogler's 12 4K tok
Claude Haiku 4.5 Closed Anthropic 2025-10 ? ? $1.0/5.0 11 7 10 20 55 Tested via agent, no file agent (no limit)
Palmyra X5 Closed Writer 2025-11 ? ? $0.6/6.0 12 7 9 19 54 Missing Forbidden Love 4K tok
Seed 2.0 Lite Closed ByteDance 2026-01 ? ? $0.25/2.0 12 ? 9 1 22 Refused to answer Tobias (overcautious copyright filter) 4K tok
600B–1T (6 models)
DeepSeek V3 0324 Open DeepSeek 2025-03 685B ? 1370 $0.2/0.77 12 7 14 20 60 BF16 size (no Q4 GGUF) 4K tok
Mistral Large Open Mistral 2025-12 675B 41B 1350 $0.5/1.5 12 7 14 20 60 BF16 size (no Q4 GGUF) 4K tok
Cogito V2.1 671B Open DeepCogito 2026-01 671B 37B 1342 $1.25/1.25 12 7 13 20 59 BF16 size (no Q4 GGUF) 4K tok
DeepSeek V3.1 Terminus Open DeepSeek 2025-09 671B 37B 1342 $0.21/0.79 12 7 13 20 59 BF16 size (no Q4 GGUF) 4K tok
DeepSeek V3.1 Open DeepSeek 2025-08 671B 37B 1342 $0.15/0.75 12 7 11 20 57 BF16 size (no Q4 GGUF) 4K tok
GLM-5 Turbo Open Zhipu 2026-02 745B 44B 1490 $1.2/4.0 12 7 7 20 53 BF16 size (no Q4 GGUF) 4K tok
Closed Large (assumed flagship) (4 models)
Gemini 3.1 Flash Lite Closed Google 2026-03 ? ? $0.25/1.5 12 7 12 20 58 1M context 4K tok
DeepSeek V3.2 Exp Open DeepSeek 2025-09 ? ? $0.27/0.41 12 7 11 20 57 BF16 size (no Q4 GGUF) 4K tok
Mistral Small 2603 Open Mistral 2026-03 ? ? $0.15/0.6 12 7 13 12 51 4K tok
Seed 2.0 Mini Closed ByteDance 2026-01 ? ? $0.1/0.4 12 7 11 9 46 4K tok
200–400B (7 models)
ERNIE 4.5 300B Open Baidu 2025-06 300B 47B 600 $0.28/1.1 12 7 14 20 60 BF16 size (no Q4 GGUF) 4K tok
Jamba Large 1.7 Open AI21 2025-09 398B 94B 796 $2.0/8.0 12 7 11 20 57 BF16 size (no Q4 GGUF) 4K tok
Trinity Large Open Arcee 2026-01 400B 13B 800 Free 12 7 10 20 56 Free tier (temporary) BF16 size (no Q4 GGUF) 4K tok
Hermes 4 405B Open NousResearch 2026-02 405B 405B 810 $1.0/3.0 12 7 9 20 55 BF16 size (no Q4 GGUF) 4K tok
Step 3.5 Flash Open StepFun 2026-01 196B 11B 394 Free 12 7 12 13 51 BF16 size (no Q4 GGUF) 4K tok
MiniMax M2.7 Open MiniMax 2026-03 230B 10B 460 $0.3/1.2 12 7 13 9 48 Tobias scored at 8K (reasoning burned 4K budget) 4K tok
MiMo V2 Flash Open Xiaomi 2026-03 309B 15B $0.09/0.29 12 7 11 6 43 4K tok
100–120B (7 models)
Qwen 3.5 122B-A10B Open Qwen 2026-03 122B 10B 81 $0.26/2.08 12 7 13 20 59 4K tok
GLM 4.5 Air Open Zhipu 2025-06 106B 12B 73 Free 12 7 11 20 57 4K tok
Nemotron 3 Super 120B Open NVIDIA 2026-03 120B 12B 87 Free 12 7 11 20 57 Free tier (temporary) 4K tok
GPT-OSS 120B Open OpenAI 2025-12 120B 5.1B 70 Free 12 6 13 15 52 4K tok
Intellect 3 Open Prime Intellect 2026-03 106B 12B 73 $0.2/1.1 12 7 13 9 48 MoE from GLM-4.5 Air base 4K tok
Solar Pro 3 Open Upstage 2025-12 102B 12B 204 $0.15/0.6 12 7 10 8 44 BF16 size (no Q4 GGUF) 4K tok
Llama 4 Scout Open Meta 2025-04 109B 17B 67 Free 12 5 9 5 36 4K tok
70B (3 models)
Hermes 4 70B Open NousResearch 2026-02 70.6B 70.6B 42.5 $0.13/0.4 12 7 10 19 55 Missing Rivalry 4K tok
Llama 3.3 70B Open Meta 2024-12 70B 70B 43 Free 12 7 10 12 48 4K tok
Hermes 4 70B IQ2_XXS Open NousResearch 2026-02 70.6B 70.6B 19.1 $0.13/0.4 1.2 ? 3 13(4f) ? ? IQ2_XXS quantisation — testing dense 70B at extreme compression 2K tok, think:false
22–30B (15 models)
Trinity Mini Open Arcee 2026-01 26B 3B 16 Free 12 7 13 16 55 Free tier (temporary) 4K tok
Qwen 3.5 27B Q4 Open Qwen 2026-03 27.8B 27.8B 17 $0.2/1.56 1.5 12 7 9 12 47 2K tok, think:false
GLM 4.7 Flash Open Zhipu 2025-08 30B 30B 19 ? 12 7 11 10 47 2K tok, think:false
Nemotron 3 Nano 30B Open NVIDIA 2026-03 30B 3.2B 24 $0.05/0.2 10.3 12 7 12 8 46 2K tok, think:false
Qwen 3.5 35B-A3B Q4 Open Qwen 2026-03 36B 3B 23 $0.16/1.3 5.6 12 7 8 11 45 2K tok, think:false
Gemma 3 27B Open Google 2025-03 27B 27B 17 Free 12 7 9(1f) 9 44 2K tok, think:false
Cydonia 24B Open TheDrummer 2025-10 24B 24B 14.3 $0.3/0.5 12 6 7 12 43 Community fine-tune 4K tok
Tongyi DeepResearch 30B Open Alibaba 2026-02 30.5B 3.3B 19 $0.09/0.45 12 7 13 4 43 4K tok
ERNIE 4.5 21B-A3B Open Baidu 2025-06 21B 3B 13 $0.07/0.28 12 6 9 9 42 4K tok
Qwen 3.5 Flash Open Qwen 2026-02 35B 3B 3.4 $0.07/0.26 12 7 11 5 42 Same model as Qwen 3.5 35B-A3B (API name) 4K tok
GPT-OSS 20B Open OpenAI 2025-12 21B 3.6B 13 Free 12 6 12 5 41 Reasoning burned all tokens 4K tok
Trinity Mini (local) Open Arcee 2026-01 26B 3B 15 Free 12.5 11 7 9 6 40 ChatML template fix. Think blocks burn tokens — got 10/12 Campbell, ran out on Tobias. 2K tok, think:false
Liquid LFM-2 24B-A2B Open Liquid 2026-02 24B 2.3B 15 $0.03/0.12 12 7 11 3 40 4K tok
Qwen 3.5 27B IQ2_XXS Open Qwen 2026-03 27.8B 27.8B 7.8 $0.2/1.56 2.7 12 4 7(2f) 7 34 2K tok, think:false
Dolphin Mistral 24B Open Mistral 2025-06 24B 24B 14.5 Free ? ? ? 15 ? Venice free endpoint unreliable — 3/4 tests failed 4K tok
13–16B (5 models)
Ministral 3 14B (local) Open Mistral 2025-12 14B 14B 8.7 $0.2/0.2 7.3 12 7 9(4f) 14 49 2K tok, think:false
Phi-4 14B Open Microsoft 2024-12 14B 14B 9.1 $0.07/0.14 6.1 12 7 8 13 47 2K tok, think:false
Ministral 14B Open Mistral 2025-12 14B 14B 8.5 $0.2/0.2 12 7 11 8 45 4K tok
Gemma 3 12B Open Google 2025-03 12B 12B 8.1 Free 11.3 12 7 9 5 40 2K tok, think:false
Hunyuan A13B Open Tencent 2025-11 13B 13B 8 $0.14/0.57 12 3 7 2 27 4K tok
7–11B (11 models)
Ministral 8B Open Mistral 2025-12 7.4B 7.4B 6.0 $0.15/0.15 12 7 13 9 48 4K tok
Ministral 3 8B (local) Open Mistral 2025-12 7.4B 7.4B 6.0 $0.15/0.15 21 12 7 9(5f) 9 44 Local verification of API results 2K tok, think:false
Nemotron Nano 9B V2 (local) Open NVIDIA 2026-03 9B 9B 6.1 Free 7.5 12 7 9 8 43 Garbled instruction following (generates homework alongside answers). Names present in output. 2K tok, think:false
Llama 3.1 8B Instant Open Meta 2024-07 8B 8B 4.9 Free 12 7 10 5 41 4K tok
Nemotron Nano 9B V2 Open NVIDIA 2026-03 9B 9B 5.5 Free 12 6 12 2 38 4K tok
Qwen 3.5 9B Q4 Open Qwen 2026-03 9.7B 9.7B 6.6 $0.05/0.15 14 12 7 6(1f) 5 37 2K tok, think:false
Gemma 3n E4B Open Google 2025-06 8B 4B eff 7.5 Free 24 12 7 10(1f) 1 37 2K tok, think:false
OLMo 3 7B Open Allen AI 2026-02 7B 7B 4.5 ? 32 12 4 10(1f) 5 35 All output in thinking field 2K tok, think:false (model ignored)
Qwen 3.5 9B (API) Open Qwen 2026-03 9.7B 9.7B 6.6 $0.05/0.15 1 7 12 7 34 Reasoning burned all tokens 4K tok
Gemma 3n E2B Open Google 2025-06 8B 2B eff 5.6 Free 55 12 6 7(2f) 3 34 2K tok, think:false
ALLaM-2 7B Open SDAIA 2025-08 7B 7B 4.3 Free 8 2 9 8 29 Arabic-focused 4K tok
3–6B (10 models)
Ministral 3B Open Mistral 2025-12 5.78B 5.78B 3.5 $0.1/0.1 12 4 12 5 37 4K tok
Gemma 3 4B Open Google 2025-03 4B 4B 3.3 Free 50 12 4 8(1f) 8 36 2K tok, think:false
Granite 4.0 H Micro Open IBM 2026-03 3B 3B 1.8 $0.02/0.11 12 7 7 3 36 4K tok
Ministral 3 3B (local) Open Mistral 2025-12 5.78B 5.78B 3.5 $0.1/0.1 60 12 5 9(1f) 3 34 Tobias: fabricated Revenge X variants to fill 20 slots 2K tok, think:false
Qwen 3.5 4B Q4 Open Qwen 2026-03 4.7B 4.7B 3.4 $0.02/0.1 35 12 3 7(1f) 0 25 2K tok, think:false
Nemotron 3 Nano 4B Open NVIDIA 2026-03 4B 4B 2.8 $0.04/0.16 51 7 5 6(1f) 1 24 2K tok, think:false
Qwen 3.5 4B Q8 Open Qwen 2026-03 4.7B 4.7B 5.3 $0.02/0.1 18 12 2 6(4f) 0 22 2K tok, think:false
Phi-4 Mini Open Microsoft 2025-01 3.8B 3.8B 2.5 $0.07/0.14 56 5 4 8 0 21 Looping on long outputs 2K tok, think:false
Qwen 3.5 4B BF16 Open Qwen 2026-03 4.7B 4.7B 9.3 $0.02/0.1 5.7 12 2 5(4f) 0 21 2K tok, think:false
Trinity Nano Open Arcee 2026-01 6B 1B 3.8 ? 68 0 3 0 0 6 Broken — looped on every test 2K tok, think:false
<1.5B (3 models)
Gemma 3 1B Open Google 2025-03 1B 1B 0.82 $0.01/0.02 105 12 2 7(1f) 0 23 2K tok, think:false
Liquid LFM-2.5 1.2B Open Liquid 2026-01 1.2B 1.2B 0.8 Free 11 ? 7 1 19 4K tok
Gemma 3 270M Open Google 2025-03 270M 270M 0.29 $0.01/0.02 200 0 0 4 0 4 Incoherent 2K tok, think:false