Using ChatGPT-OSS locally with OpenAI's open-weight templates

Using ChatGPT-OSS locally with OpenAI’s open-weight templates

For the first time, OpenAI is making available two open-source models that rival today’s best systems. They are surprisingly easy to get to grips with, as a novice user can now download, load and start using them without any complex configuration. They do, however, require a computer with the muscle to handle their size and power. This article will help you discover these models and get them running locally on Windows.

OpenAI publishes its first open-weight models since GPT-2

OpenAI ushers in a new era with the release of gpt-oss, a family of open-weight models whose weights are freely downloadable. This choice contrasts with the company’s usual strategy, which for several years has favored models accessible only via its API. An open-weight model is not an open-source model in the strict sense, as the training code and data sets remain private. Weights, on the other hand, are made available to the public, enabling local execution, quantification, adaptation to internal uses and deployment without dependence on the cloud.

This is a major event, as OpenAI has not offered this level of access since GPT-2. It finally enables developers, researchers and advanced users to manipulate an OpenAI model like any other LLM in the open-source ecosystem, with the possibility of integrating it into private, totally off-line workflows.

ModelSizeRecommended materialIdeal use
gpt-oss-20B≈ 21 billionConsumer GPURuns locally on Windows or Linux PCs
gpt-oss-120B≈ 117 billionMulti-GPU serversProfessional environments

Initial results and feedback indicate performance close to that of o3-mini for classical reasoning tasks. Both models benefit from a very large context of up to one hundred and twenty-eight thousand tokens, making them suitable for long analyses, complex documents and autonomous agents.

Inference speed is also a strong point for the 20B model when quantized and run on a modern GPU. Compared with market reference models such as LLaMA 3.1, Mixtral or Phi-3.5, gpt-oss is one of the most advanced solutions in the open-weight category, while offering very broad technical compatibility.

Using OpenAI templates locally with Ollama on Windows

Today, Ollama is the simplest and most reliable tool for running an OpenAI model locally on Windows. The application features a graphical interface that lets you choose, download and run a template without any technical manipulation. Users simply select a template from the list, click to install it and start chatting with it in a ChatGPT-style window.

Scripts, applications and tools designed for ChatGPT can operate in exactly the same way, simply by pointing to the local address provided by Ollama. So there’s no need to modify your code or automations. For everyday use, no command line is required, making the experience accessible even to a novice user.

use ChatGPT-OSS on Windows
  1. To use it, simply visit the official Ollama website and download the Windows installer.
  2. Once launched, the wizard automatically configures all the necessary elements and installs the background service that will load the templates on demand.

After installation, Ollama creates a dedicated folder in the user profile to store downloaded models. This folder is located in the user’s personal directory, making it easy to manage or delete weights if necessary. Ollama integrates a catalog of templates directly accessible from its interface.

  1. To download gpt-oss, simply open the application and select the desired model from the list.

The tool automatically retrieves the necessary files and installs them without manual intervention. Once the model is present on the machine, it immediately appears as available in the interface.

You can use the same input field as a traditional chatbot. You can type a question or adjust certain parameters. The experience is very similar to that of ChatGPT, but entirely offline.

Local execution of gpt-oss models is highly dependent on hardware configuration and the amount of VRAM available. The higher the VRAM, the more the model can be loaded directly onto the GPU, which considerably improves inference speed and stability. When a model doesn’t fit entirely on the GPU, Ollama automatically switches part of the load to RAM. This solution makes it possible to run models larger than the available video memory, but it increases response time and can lead to errors or inaccuracies when the memory voltage becomes too high.

Requirements vary widely between gpt-oss-20B and gpt-oss-120B. The former can run on a recent gamer PC thanks to quantisations, while the latter is reserved for professional infrastructures with several tens of gigabytes of VRAM.

ModelFP16Q8Q5Q4
gpt-oss-20B30-40 GB18-22 GB10-12 GB8 GB
gpt-oss-120B150+ GB90 GB

If your PC isn’t powerful enough, there are more suitable models. Discover the best models and software for running a local AI on Windows.

Local execution can be significantly improved by applying a few best practices, especially when VRAM is not optimal.

Choosing Q4 quantization on a small GPU

This format offers one of the best compromises between speed and quality. It allows gpt-oss-20B to fit on a mid-range graphics card while maintaining an acceptable level of accuracy. Below Q4, errors become more frequent and the model may produce inconsistencies.

Favoring the GPU over the CPU

Although Ollama can use RAM and CPU to complete the load, performance drops off rapidly as soon as the model leaves VRAM. The more the GPU supports all or most of the weights, the faster, more stable and more reliable the generation will be.

Using multi-GPU under Linux for giant models

The gpt-oss-120B model requires a multi-GPU environment or a server with extremely high VRAM. Under Linux, the automatic distribution of model blocks between several graphics cards enables this type of model to be used in professional environments.

Les articles que tout le monde lit en ce moment

The best local AI software and alternatives to ChatGPT for Windows

The best local AI software and alternatives to ChatGPT for Windows

Discover the best local artificial intelligence software for Windows and turn your computer into an intelligent assistant, fast and without an Internet connection.

Découvrir
ChatGPT agent overcomes CAPTCHA and checks “I’m not a robot”

ChatGPT agent overcomes CAPTCHA and checks “I’m not a robot”

A ChatGPT agent interacts with a Cloudflare CAPTCHA like a human. The AI checks "I'm not a robot" without help.

Découvrir
Microsoft Copilot among the worst language models according to an IQ test

Microsoft Copilot among the worst language models according to an IQ test

Microsoft Copilot takes second-to-last place in the AI IQ rankings. An unexpected setback for the assistant integrated into Windows 11.

Découvrir
The end of VRAM shortages? NVIDIA and DirectX reduce memory usage by up to 90% thanks to AI

The end of VRAM shortages? NVIDIA and DirectX reduce memory usage by up to 90% thanks to AI

Discover how DirectX 12 Ultimate and Cooperative Vectors enable NVIDIA to compress textures via AI, reducing VRAM usage by up to 90%.

Découvrir
This patrol robot sees and anticipates better than any security guard

This patrol robot sees and anticipates better than any security guard

A tireless, high-performance guardian: thanks to AI and cutting-edge sensors, this sentry robot ensures constant, proactive vigilance.

Découvrir
What is artificial intelligence (AI)?

What is artificial intelligence (AI)?

What is AI? A computer science discipline that combines mathematics, data science and learning to create intelligent machines.

Where to download an ISO image of macOS (Sequoia, Ventura, Sonoma)

Where to download an ISO image of macOS (Sequoia, Ventura, Sonoma)

Download macOS ISO images (Sequoia, Sonoma, Ventura...) to install the system on a virtual machine, a USB key or an old Mac.