Tag: benchmarks

All the articles with the tag "benchmarks". Browse 2 articles on AristoAiStack.

Gemini 3.1 Pro: What Developers Need to Know

19 Feb, 2026

Google just launched Gemini 3.1 Pro with a 77.1% ARC-AGI-2 score — double the previous version. Here's what changed, how it compares to GPT-5.3 and Claude Opus 4.6, and whether developers should switch.
GPT-5.2 Solved a 15-Year Physics Mystery — Then Scored 0%

16 Feb, 2026

GPT-5.2 derived a new gluon formula that stumped physicists for 15 years, then scored 0% on a research physics benchmark. Here's what the AI reasoning paradox means for you.

Gemini 3.1 Pro: What Developers Need to Know