Tag: benchmarks
All the articles with the tag "benchmarks". Browse 2 articles on AristoAiStack.
-
Gemini 3.1 Pro: What Developers Need to Know
Google just launched Gemini 3.1 Pro with a 77.1% ARC-AGI-2 score — double the previous version. Here's what changed, how it compares to GPT-5.3 and Claude Opus 4.6, and whether developers should switch.
-
GPT-5.2 Solved a 15-Year Physics Mystery — Then Scored 0%
GPT-5.2 derived a new gluon formula that stumped physicists for 15 years, then scored 0% on a research physics benchmark. Here's what the AI reasoning paradox means for you.