3mo ago

I am working on an "AI Efficiency Index"

DeepSeek V3.2-Exp (September 2025) — 80.00
Kimi K2 Thinking (November 2025) — 72.59
Gemini 2.5 Flash (May 2025) — 68.79
GPT-5 (August 2025) — 23.44
o3 (April 2025) — 22.57
Gemini 2.5 Pro (March 2025) — 21.37
Gemini 3 Pro (November 2025) — 16.93
Claude 3.5 Sonnet (August 2025) — 11.67
GPT-5 Pro (August 2025) — 1.96

The goal is to find out which models are the best practical performance per dollar spent on real world available API pricing. Right now it seems like DeepSeek and Kimi are winning, and Google has lost its way since Gemini 2.5 Flash.

This interests me because LLMs are global, especially if they're open sourced like Kimi K2 Thinking. Anyone can use the cheaper one just as easily, and no tariffs or policy can stop it.