Polymyth Model Capability Audit
Narrative knowledge tests across 80 models — sorted by composite score
Score vs Output Cost (Efficient Frontier)
| Model | Open | Maker | Released | Total Params | Active Params | Disk (Q4) | Cost (in/out $/M) | Speed (tok/s) | Campbell /12 | Booker /7 | Jung | Tobias /20 | Score | Notes | Test Conditions |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Closed XXL (1 model) | |||||||||||||||
| Aion 2.0 | Closed | Aion Labs | 2026-02 | ? | ? | — | $0.8/1.6 | — | 12 | 7 | 13 | 20 | 59 | DeepSeek V3.2 fine-tune | 4K tok |
| 1T+ (4 models) | |||||||||||||||
| MiMo V2 Pro | Open | Xiaomi | 2026-03 | 1T+ | 42B | 2000 | $1.0/3.0 | — | 12 | 7 | 13 | 20 | 59 | Categorised plots per book structure BF16 size (no Q4 GGUF) | 4K tok |
| Kimi K2 | Open | Moonshot | 2025-07 | 1T | 32B | 2000 | Free | — | 12 | 7 | 12 | 20 | 58 | BF16 size (no Q4 GGUF) | 2K tok |
| Kimi K2.5 | Open | Moonshot | 2025-12 | 1T | 32B | 2000 | $0.45/2.2 | — | 12 | 7 | 12 | 20 | 58 | BF16 size (no Q4 GGUF) | 4K tok |
| DeepSeek V3.2 | Open | DeepSeek | 2025-12 | 1.16T | 37B | 2320 | $0.26/0.38 | — | 12 | 7 | 13 | 18 | 57 | BF16 size (no Q4 GGUF) | 4K tok |
| Closed XL (4 models) | |||||||||||||||
| Gemini 3 Flash Preview | Closed | 2026-03 | ? | ? | — | $0.5/3.0 | — | 15 | 7 | 8 | 20 | 57 | Used Campbell's 17-stage, not Vogler's 12 | 4K tok | |
| Claude Haiku 4.5 | Closed | Anthropic | 2025-10 | ? | ? | — | $1.0/5.0 | — | 11 | 7 | 10 | 20 | 55 | Tested via agent, no file | agent (no limit) |
| Palmyra X5 | Closed | Writer | 2025-11 | ? | ? | — | $0.6/6.0 | — | 12 | 7 | 9 | 19 | 54 | Missing Forbidden Love | 4K tok |
| Seed 2.0 Lite | Closed | ByteDance | 2026-01 | ? | ? | — | $0.25/2.0 | — | 12 | ? | 9 | 1 | 22 | Refused to answer Tobias (overcautious copyright filter) | 4K tok |
| 600B–1T (6 models) | |||||||||||||||
| Mistral Large | Open | Mistral | 2025-12 | 675B | 41B | 1350 | $0.5/1.5 | — | 12 | 7 | 14 | 20 | 60 | BF16 size (no Q4 GGUF) | 4K tok |
| Cogito V2.1 671B | Open | DeepCogito | 2026-01 | 671B | 37B | 1342 | $1.25/1.25 | — | 12 | 7 | 13 | 20 | 59 | BF16 size (no Q4 GGUF) | 4K tok |
| GLM-5 Turbo | Open | Zhipu | 2026-02 | 745B | 44B | 1490 | $1.2/4.0 | — | 12 | 7 | 7 | 20 | 53 | BF16 size (no Q4 GGUF) | 4K tok |
| Closed Large (assumed flagship) (4 models) | |||||||||||||||
| Gemini 3.1 Flash Lite | Closed | 2026-03 | ? | ? | — | $0.25/1.5 | — | 12 | 7 | 12 | 20 | 58 | 1M context | 4K tok | |
| Mistral Small 2603 | Open | Mistral | 2026-03 | ? | ? | — | $0.15/0.6 | — | 12 | 7 | 13 | 12 | 51 | 4K tok | |
| Seed 2.0 Mini | Closed | ByteDance | 2026-01 | ? | ? | — | $0.1/0.4 | — | 12 | 7 | 11 | 9 | 46 | 4K tok | |
| 200–400B (7 models) | |||||||||||||||
| ERNIE 4.5 300B | Open | Baidu | 2025-06 | 300B | 47B | 600 | $0.28/1.1 | — | 12 | 7 | 14 | 20 | 60 | BF16 size (no Q4 GGUF) | 4K tok |
| Jamba Large 1.7 | Open | AI21 | 2025-09 | 398B | 94B | 796 | $2.0/8.0 | — | 12 | 7 | 11 | 20 | 57 | BF16 size (no Q4 GGUF) | 4K tok |
| Trinity Large | Open | Arcee | 2026-01 | 400B | 13B | 800 | Free | — | 12 | 7 | 10 | 20 | 56 | Free tier (temporary) BF16 size (no Q4 GGUF) | 4K tok |
| Hermes 4 405B | Open | NousResearch | 2026-02 | 405B | 405B | 810 | $1.0/3.0 | — | 12 | 7 | 9 | 20 | 55 | BF16 size (no Q4 GGUF) | 4K tok |
| Step 3.5 Flash | Open | StepFun | 2026-01 | 196B | 11B | 394 | Free | — | 12 | 7 | 12 | 13 | 51 | BF16 size (no Q4 GGUF) | 4K tok |
| MiniMax M2.7 | Open | MiniMax | 2026-03 | 230B | 10B | 460 | $0.3/1.2 | — | 12 | 7 | 13 | 9 | 48 | Tobias scored at 8K (reasoning burned 4K budget) | 4K tok |
| MiMo V2 Flash | Open | Xiaomi | 2026-03 | 309B | 15B | — | $0.09/0.29 | — | 12 | 7 | 11 | 6 | 43 | 4K tok | |
| 100–120B (7 models) | |||||||||||||||
| Qwen 3.5 122B-A10B | Open | Qwen | 2026-03 | 122B | 10B | 81 | $0.26/2.08 | — | 12 | 7 | 13 | 20 | 59 | 4K tok | |
| GLM 4.5 Air | Open | Zhipu | 2025-06 | 106B | 12B | 73 | Free | — | 12 | 7 | 11 | 20 | 57 | 4K tok | |
| Nemotron 3 Super 120B | Open | NVIDIA | 2026-03 | 120B | 12B | 87 | Free | — | 12 | 7 | 11 | 20 | 57 | Free tier (temporary) | 4K tok |
| GPT-OSS 120B | Open | OpenAI | 2025-12 | 120B | 5.1B | 70 | Free | — | 12 | 6 | 13 | 15 | 52 | 4K tok | |
| Intellect 3 | Open | Prime Intellect | 2026-03 | 106B | 12B | 73 | $0.2/1.1 | — | 12 | 7 | 13 | 9 | 48 | MoE from GLM-4.5 Air base | 4K tok |
| Solar Pro 3 | Open | Upstage | 2025-12 | 102B | 12B | 204 | $0.15/0.6 | — | 12 | 7 | 10 | 8 | 44 | BF16 size (no Q4 GGUF) | 4K tok |
| Llama 4 Scout | Open | Meta | 2025-04 | 109B | 17B | 67 | Free | — | 12 | 5 | 9 | 5 | 36 | 4K tok | |
| 70B (3 models) | |||||||||||||||
| Hermes 4 70B | Open | NousResearch | 2026-02 | 70.6B | 70.6B | 42.5 | $0.13/0.4 | — | 12 | 7 | 10 | 19 | 55 | Missing Rivalry | 4K tok |
| Llama 3.3 70B | Open | Meta | 2024-12 | 70B | 70B | 43 | Free | — | 12 | 7 | 10 | 12 | 48 | 4K tok | |
| Hermes 4 70B IQ2_XXS | Open | NousResearch | 2026-02 | 70.6B | 70.6B | 19.1 | $0.13/0.4 | 1.2 | ? | 3 | 13(4f) | ? | ? | IQ2_XXS quantisation — testing dense 70B at extreme compression | 2K tok, think:false |
| 22–30B (15 models) | |||||||||||||||
| Trinity Mini | Open | Arcee | 2026-01 | 26B | 3B | 16 | Free | — | 12 | 7 | 13 | 16 | 55 | Free tier (temporary) | 4K tok |
| Qwen 3.5 27B Q4 | Open | Qwen | 2026-03 | 27.8B | 27.8B | 17 | $0.2/1.56 | 1.5 | 12 | 7 | 9 | 12 | 47 | 2K tok, think:false | |
| GLM 4.7 Flash | Open | Zhipu | 2025-08 | 30B | 30B | 19 | ? | — | 12 | 7 | 11 | 10 | 47 | 2K tok, think:false | |
| Nemotron 3 Nano 30B | Open | NVIDIA | 2026-03 | 30B | 3.2B | 24 | $0.05/0.2 | 10.3 | 12 | 7 | 12 | 8 | 46 | 2K tok, think:false | |
| Qwen 3.5 35B-A3B Q4 | Open | Qwen | 2026-03 | 36B | 3B | 23 | $0.16/1.3 | 5.6 | 12 | 7 | 8 | 11 | 45 | 2K tok, think:false | |
| Gemma 3 27B | Open | 2025-03 | 27B | 27B | 17 | Free | — | 12 | 7 | 9(1f) | 9 | 44 | 2K tok, think:false | ||
| Cydonia 24B | Open | TheDrummer | 2025-10 | 24B | 24B | 14.3 | $0.3/0.5 | — | 12 | 6 | 7 | 12 | 43 | Community fine-tune | 4K tok |
| Tongyi DeepResearch 30B | Open | Alibaba | 2026-02 | 30.5B | 3.3B | 19 | $0.09/0.45 | — | 12 | 7 | 13 | 4 | 43 | 4K tok | |
| ERNIE 4.5 21B-A3B | Open | Baidu | 2025-06 | 21B | 3B | 13 | $0.07/0.28 | — | 12 | 6 | 9 | 9 | 42 | 4K tok | |
| GPT-OSS 20B | Open | OpenAI | 2025-12 | 21B | 3.6B | 13 | Free | — | 12 | 6 | 12 | 5 | 41 | Reasoning burned all tokens | 4K tok |
| Trinity Mini (local) | Open | Arcee | 2026-01 | 26B | 3B | 15 | Free | 12.5 | 11 | 7 | 9 | 6 | 40 | ChatML template fix. Think blocks burn tokens — got 10/12 Campbell, ran out on Tobias. | 2K tok, think:false |
| Liquid LFM-2 24B-A2B | Open | Liquid | 2026-02 | 24B | 2.3B | 15 | $0.03/0.12 | — | 12 | 7 | 11 | 3 | 40 | 4K tok | |
| Qwen 3.5 27B IQ2_XXS | Open | Qwen | 2026-03 | 27.8B | 27.8B | 7.8 | $0.2/1.56 | 2.7 | 12 | 4 | 7(2f) | 7 | 34 | 2K tok, think:false | |
| 13–16B (5 models) | |||||||||||||||
| Ministral 3 14B (local) | Open | Mistral | 2025-12 | 14B | 14B | 8.7 | $0.2/0.2 | 7.3 | 12 | 7 | 9(4f) | 14 | 49 | 2K tok, think:false | |
| Phi-4 14B | Open | Microsoft | 2024-12 | 14B | 14B | 9.1 | $0.07/0.14 | 6.1 | 12 | 7 | 8 | 13 | 47 | 2K tok, think:false | |
| Ministral 14B | Open | Mistral | 2025-12 | 14B | 14B | 8.5 | $0.2/0.2 | — | 12 | 7 | 11 | 8 | 45 | 4K tok | |
| Gemma 3 12B | Open | 2025-03 | 12B | 12B | 8.1 | Free | 11.3 | 12 | 7 | 9 | 5 | 40 | 2K tok, think:false | ||
| Hunyuan A13B | Open | Tencent | 2025-11 | 13B | 13B | 8 | $0.14/0.57 | — | 12 | 3 | 7 | 2 | 27 | 4K tok | |
| 7–11B (11 models) | |||||||||||||||
| Ministral 8B | Open | Mistral | 2025-12 | 7.4B | 7.4B | 6.0 | $0.15/0.15 | — | 12 | 7 | 13 | 9 | 48 | 4K tok | |
| Ministral 3 8B (local) | Open | Mistral | 2025-12 | 7.4B | 7.4B | 6.0 | $0.15/0.15 | 21 | 12 | 7 | 9(5f) | 9 | 44 | Local verification of API results | 2K tok, think:false |
| Nemotron Nano 9B V2 (local) | Open | NVIDIA | 2026-03 | 9B | 9B | 6.1 | Free | 7.5 | 12 | 7 | 9 | 8 | 43 | Garbled instruction following (generates homework alongside answers). Names present in output. | 2K tok, think:false |
| Llama 3.1 8B Instant | Open | Meta | 2024-07 | 8B | 8B | 4.9 | Free | — | 12 | 7 | 10 | 5 | 41 | 4K tok | |
| Nemotron Nano 9B V2 | Open | NVIDIA | 2026-03 | 9B | 9B | 5.5 | Free | — | 12 | 6 | 12 | 2 | 38 | 4K tok | |
| Qwen 3.5 9B Q4 | Open | Qwen | 2026-03 | 9.7B | 9.7B | 6.6 | $0.05/0.15 | 14 | 12 | 7 | 6(1f) | 5 | 37 | 2K tok, think:false | |
| Gemma 3n E4B | Open | 2025-06 | 8B | 4B eff | 7.5 | Free | 24 | 12 | 7 | 10(1f) | 1 | 37 | 2K tok, think:false | ||
| OLMo 3 7B | Open | Allen AI | 2026-02 | 7B | 7B | 4.5 | ? | 32 | 12 | 4 | 10(1f) | 5 | 35 | All output in thinking field | 2K tok, think:false (model ignored) |
| Gemma 3n E2B | Open | 2025-06 | 8B | 2B eff | 5.6 | Free | 55 | 12 | 6 | 7(2f) | 3 | 34 | 2K tok, think:false | ||
| ALLaM-2 7B | Open | SDAIA | 2025-08 | 7B | 7B | 4.3 | Free | — | 8 | 2 | 9 | 8 | 29 | Arabic-focused | 4K tok |
| 3–6B (10 models) | |||||||||||||||
| Ministral 3B | Open | Mistral | 2025-12 | 5.78B | 5.78B | 3.5 | $0.1/0.1 | — | 12 | 4 | 12 | 5 | 37 | 4K tok | |
| Gemma 3 4B | Open | 2025-03 | 4B | 4B | 3.3 | Free | 50 | 12 | 4 | 8(1f) | 8 | 36 | 2K tok, think:false | ||
| Granite 4.0 H Micro | Open | IBM | 2026-03 | 3B | 3B | 1.8 | $0.02/0.11 | — | 12 | 7 | 7 | 3 | 36 | 4K tok | |
| Ministral 3 3B (local) | Open | Mistral | 2025-12 | 5.78B | 5.78B | 3.5 | $0.1/0.1 | 60 | 12 | 5 | 9(1f) | 3 | 34 | Tobias: fabricated Revenge X variants to fill 20 slots | 2K tok, think:false |
| Qwen 3.5 4B Q4 | Open | Qwen | 2026-03 | 4.7B | 4.7B | 3.4 | $0.02/0.1 | 35 | 12 | 3 | 7(1f) | 0 | 25 | 2K tok, think:false | |
| Nemotron 3 Nano 4B | Open | NVIDIA | 2026-03 | 4B | 4B | 2.8 | $0.04/0.16 | 51 | 7 | 5 | 6(1f) | 1 | 24 | 2K tok, think:false | |
| Phi-4 Mini | Open | Microsoft | 2025-01 | 3.8B | 3.8B | 2.5 | $0.07/0.14 | 56 | 5 | 4 | 8 | 0 | 21 | Looping on long outputs | 2K tok, think:false |
| Trinity Nano | Open | Arcee | 2026-01 | 6B | 1B | 3.8 | ? | 68 | 0 | 3 | 0 | 0 | 6 | Broken — looped on every test | 2K tok, think:false |
| <1.5B (3 models) | |||||||||||||||
| Gemma 3 1B | Open | 2025-03 | 1B | 1B | 0.82 | $0.01/0.02 | 105 | 12 | 2 | 7(1f) | 0 | 23 | 2K tok, think:false | ||
| Liquid LFM-2.5 1.2B | Open | Liquid | 2026-01 | 1.2B | 1.2B | 0.8 | Free | — | 11 | ? | 7 | 1 | 19 | 4K tok | |
| Gemma 3 270M | Open | 2025-03 | 270M | 270M | 0.29 | $0.01/0.02 | 200 | 0 | 0 | 4 | 0 | 4 | Incoherent | 2K tok, think:false | |