OpenAI's newly released GPT-4o mini dominates the Chatbot Arena. Here's why.

Trophy technology — Ventris/Science Photo Library/Getty Images

One week ago, OpenAI released GPT-4o mini. In that short time, it has already been updated and climbed the leaderboards of the Large Model Systems Organization (LMSYS) Chatbot Arena, ahead of giants such as Claude 3.5 Sonnet and Gemini Advanced.

The LMSYS Chatbot Arena is a crowdsourced platform where users can evaluate large language models (LLMs) by chatting with two LLMs side by side and comparing their responses to each other without knowing the models’ names.

Also: Want to try GPT-4o mini? 3 ways to access the smarter, cheaper AI model – and 2 are free

Immediately after its unveiling, GPT-4o mini was added to the Arena, where it quickly climbed to the top of the leaderboard behind GPT-4o. This is especially notable because GPT-4o mini is 20 times cheaper than its predecessor.

Exciting Chatbot Arena Update — GPT-4o mini’s result is out!
With 4K+ user votes, GPT-4o mini climbs to the top of the leaderboard, now joint #1 with GPT-4o while being 20x cheaper! Significantly better than its early version (“upcoming-gpt-mini”) in Arena across the boards.… pic.twitter.com/xanm2Bqtg9

— lmsys.org (@lmsysorg) July 23, 2024

As the results came out, some users took to social media to express apprehensions about how such a new mini model could rank higher than more established, robust, and capable models such as Claude 3.5 Sonnet. To address the concerns, LMSYS — posting on X — explained the factors contributing to GPT-4o mini’s high placement, highlighting that the Chatbot Arena positions are informed by human preferences depending on the votes.

Exciting Chatbot Arena Update — GPT-4o mini’s result is out!
With 4K+ user votes, GPT-4o mini climbs to the top of the leaderboard, now joint #1 with GPT-4o while being 20x cheaper! Significantly better than its early version (“upcoming-gpt-mini”) in Arena across the boards.… pic.twitter.com/xanm2Bqtg9

— lmsys.org (@lmsysorg) July 23, 2024

For users interested in learning which model works better, LMSYS encourages them to look at the per-category breakdowns to understand technical capabilities. These can be accessed by clicking the Category dropdown that says “Overall” and selecting a different category. When you visit the various category breakdowns — such as coding, hard prompts, and longer queries — you will see a variation in the results.

Also: OpenAI launches SearchGPT – here’s what it can do and how to access it

In the coding category, GPT-4o mini is ranked third behind GPT-4o and Claude 3.5 Sonnet, which holds first place. However, GPT-4o mini is number one in other categories, such as multi-turn, conversations greater than or equal to two turns, and longer query queries equal to or greater than 500 tokens.

LMSYS — Chatbot Arena results in the “coding” category.

Screenshot by Sabrina Ortiz/ZDNET

If you want to try GPT-4o mini, visit the ChatGPT site and log into your OpenAI account. If you would rather participate in the Chatbot Arena and let luck show you GPT-4o mini, you can start by visiting the website, clicking Arena side-by-side, and then entering a sample prompt.

Artificial Intelligence

Editorial standards