AI Models Rate GPT 5.1 as the Leading Language Tool
Andrej Karpathy, an AI researcher known for his work with Eureka Labs, recently conducted an intriguing experiment called “LLM-Council.” This project aimed to evaluate how various AI models respond to user queries by having them assess each other’s answers. The results showed that OpenAI’s GPT-5.1 consistently earned the highest rankings, even in light of previous claims that Google’s Gemini 3.0 had surpassed it in capabilities and reasoning.
Karpathy found that the participating models often rated their peers’ answers higher than their own. This collective evaluation method sheds light on the strengths and weaknesses of each model. For instance, during a recent test where models analyzed book chapters, GPT-5.1 was commended as the most insightful, while Claude was picked as the least effective. This approach is not only innovative but also provides a unique perspective on how these models differentiate themselves from one another.
The experimental process followed a structured three-step format: first, user queries were directed to various models independently, and their anonymous responses were displayed side by side. Then, each model ranked the others based on the quality and depth of their answers, maintaining anonymity. Finally, a designated “chairman model” compiled the rankings and feedback to create a refined consensus response. However, Karpathy reminded us that these assessments are inherently subjective and might not reflect every user’s experience or opinion. For example, he personally found GPT-5.1 a bit verbose compared to the more succinct Gemini 3.0, while he considered Claude too direct.
In a fascinating twist, Vasuman M, the CEO of Varick AI Agents, shared on social media that he had conducted a similar analysis months earlier. He reported that GPT-5.1 consistently came out on top, even when competitors were included in the tests. He noted an amusing reaction from other models when they were informed that GPT-5.1 authored a response; they often adjusted their own answers based on it.
The insights from this experiment highlight the competitive nature of AI development and the importance of understanding how different models perform under scrutiny. As businesses continue to adopt AI solutions, knowing which models excel can guide their investments in technology. For developers and AI engineers, recognizing these trends helps shape future improvements and innovations in the field.
“Content generated using AI”
We create intelligent software and AI-driven solutions to automate workflows, modernize legacy systems, and sharpen your competitive edge.
