Microsoft Copilot among the worst language models according to an IQ test

Microsoft Copilot among the worst language models according to an IQ test

Artificial intelligence is impressive, but not all models are created equal. In a recently published IQ benchmark test, several language models were evaluated based on their ability to reason logically, understand analogies, and solve abstract problems. The surprise? Microsoft Copilot ranked near the bottom. A disappointing performance that raises questions about the tool’s true maturity, despite its deep integration within the Windows ecosystem.

Microsoft Copilot among the worst language models according to an IQ test
Happy reading

AI IQ Benchmark: an Internet-free test based on reasoning

To assess the true cognitive capabilities of linguistic models, the researchers behind the test have deliberately excluded any questions based on indexable facts or memorized data. Out go encyclopedic knowledge or answers taken from the web: this benchmark relies solely on logical reasoning questions, designed to measure an AI’s ability to deduce, anticipate or complete abstract reasoning.

The format is directly inspired by Mensa or SAT-type admission tests, with logical sequences, analogies, verbal or mathematical puzzles, but without any possible recourse to the Internet or pre-trained databases. The aim is not to judge what the AI knows, but what it understands and infers in real time.

A rating grid between 55 and 145

The score assigned to each model follows an IQ scale modelled on that of humans, ranging from 55 (low) to 145 (highly superior). A score of 100 corresponds to the average level expected of a human adult.

Each AI has been tested under identical conditions, without context-sensitive assistance or external Internet access. This provides a reliable basis for comparison between the different models, whether they are consumer AIs like ChatGPT, open source models like Mistral, or integrated solutions like Microsoft Copilot.

This strict protocol highlights profound differences between AI architectures, in particular their ability to simulate autonomous logical reasoning, without external support or context.

Test results: Copilot at the back of the pack

The verdict is clear: Microsoft Copilot ranks 25th out of 26 artificial intelligence models tested. In stand-alone mode (offline), it scored just 67, well below the human average. Even in the Norwegian Mensa test, its score plateaued at 84, far behind the leading contenders.

Images are displayed when advertising is allowed.
Copilot at the back of the pack trackingai.org
Copilot finished 25th out of 26 ©trackingai.org

By way of comparison :

  • Grok-4 (xAI/Elon Musk) reaches 136,
  • Claude 3 Opus (Anthropic) peaks at 131,
  • OpenAI o3 Pro (aka GPT-4o) is positioned at 117.

This ranking is all the more surprising given that Copilot is partially based on OpenAI GPT-4 models in some cases. How can such poor performance be explained when it is supposed to benefit from the best engines available?

Several hypotheses emerge:

  • In offline or enterprise mode, Copilot doesn’t seem to use GPT-4 in its entirety, but rather a lightened or restricted version.
  • Its integration with Microsoft 365 prioritizes office tasks and practical answers, to the detriment of abstract or logical reasoning.
  • Certain technical limitations (filtering, latency, security priority) may alter its raw performance on this type of test.

This underperformance doesn’t mean Copilot is useless, but it does underline the fact that, outside Microsoft scenarios, the tool has little margin against the leaders in general intelligence.

The articles everyone is reading right now

What is artificial intelligence (AI)?

What is artificial intelligence (AI)?

What is AI? A computer science discipline that combines mathematics, data science and learning to create intelligent machines.

What’s the difference between Type 1 and Type 2 hypervisors?

What’s the difference between Type 1 and Type 2 hypervisors?

Everything you need to know about hypervisors: definition, comparison between type 1 bare metal and type 2 hosted, examples and advice on making the right choice.

Hyper-V is a type 1 hypervisor, not a type 2 hypervisor

Hyper-V is a type 1 hypervisor, not a type 2 hypervisor

Hyper-V is often mistaken for a type 2 hypervisor. Find out why it's actually a type 1, integrated into Windows but running on hardware.

5 free alternatives to Word and Office on a Windows PC in 2026

5 free alternatives to Word and Office on a Windows PC in 2026

Tired of paying for Microsoft Office 365? Here are 5 free alternatives for 2024: modern, high-performance office tools for all your needs.

Windows 10: ESU now available to consumers after the end of support in 2025

Windows 10: ESU now available to consumers after the end of support in 2025

Windows 10: everything you need to know about the ESU program to keep receiving security updates and avoid vulnerability risks.

FlyOOBE: migrate to Windows 11 if your PC is not compatible without reinstallation

FlyOOBE: migrate to Windows 11 if your PC is not compatible without reinstallation

Flyoobe lets you migrate to Windows 11 on a non-compatible PC, without reinstalling or losing data. See how it's done with a local ISO.

Microsoft Defender will continue to protect Windows 10 until 2028

Microsoft Defender will continue to protect Windows 10 until 2028

Security reprieve for Windows 10! Microsoft Defender Antivirus will receive security updates until October 2028 despite the official end of support.

Scroll to Top