Llama.cpp published its own gpt-oss-20b tests, showing that the GeForce RTX 5090 led the way with an impressive 282 tok/s. This compared to the Mac M3 Ultra (116 tok/s) and the AMD 7900 XTX (102 tok/s).
This happens because GeForce RTX 5090 includes integrated Tensor Coresdesigned to accelerate AI tasks by maximizing performance when running gpt-oss-20b locally.
The “tok/s” measure, or tokens per second, measures tokens, a piece of text that the model reads or generates in a single step, and how quickly they can be processed.

Llama.cpp is an open source framework that allows you to run LLMs (Large Language Models) with great performance. And it runs especially well on RTX GPUs thanks to optimizations made in collaboration with NVIDIA.
For AI enthusiasts who just want to use local LLMs with these NVIDIA optimizations, you might consider using the LM Studio application, built on top of Llama.cpp. The program adds support for RAG (recovery augmented generation) and is designed to make it easier to run and experiment with LLMs.
The main advantage is that it removes the need to deal with command-line tools or complex technical configurations.
Related News:
Local AIs

Developers and creators seeking greater control and privacy in their use of AI are turning to locally running models, such as OpenAI’s new gpt-oss model family. They are lightweight and incredibly functional on home user hardware.
This means that it is possible to run them on GPUs with just 16 GB of memory. In other words, it is possible to use a wide range of hardware, with NVIDIA GPUs emerging as the best way to run these types of models.

As countries and companies rush to develop their own bespoke AI solutions for a variety of large and complex tasks, Open source models like OpenAI’s new gpt-oss-20b are finding much more adoption.
And this latest release is practically comparable to the GPT-4o mini model.
The model also features chain-of-thought reasoning to analyze problems deeply, adjustable reasoning levels to adjust real-time reasoning capabilities, expanded context length, and efficiency adjustments to help it run on local hardware.
Other options

Another popular open source framework for AI testing and experimentation is Ollama. It is great for testing different AI models, including the OpenAI gpt-oss models. And NVIDIA worked closely to optimize performance.
Ollama manages model downloads, environment configuration, and GPU acceleration automatically. It also provides integrated model management to support multiple models simultaneously, easily integrating with on-premises applications and workflows.
Similar to llama.cpp, other applications also use Ollama to run LLMs. One example is AnythingLLM, with its local and direct interface, making it excellent for those starting to benchmark LLM.
Cost
Regardless of the application used to test gpt-oss-20b, the latest NVIDIA Blackwell GPUs appear to offer the best performance. The main problem is the cost, as an RTX 5090 costs up to R$26,773.51 at Kabum.
RTX 5080 models can also be expensive, with MSI’s Gaming Trio OC costing R$ 24,499.00, but there are more affordable options such as the ASUS ROG Astral for R$ 14,999.99.
RTX 5070 models cost half that price, with Gigabyte’s GAMING OC costing R$7,099.99. Opting for an RTX 5070 Ti model, prices are higher, reaching R$10,101.66 (Solid OC from Zotac).
Things are more affordable in the RTX 5060 Ti models, which costs up to R$3,499.99 (ASUS DUAL). In the case of the RTX 5050, it is possible to purchase it for half that price, R$1,759.99 (Palit), but the performance justifies a larger investment.
Fonte: Github.
*The sale of products indicated on this page may generate commission for Adrenaline.
Join the Adrenaline offer group
Check out the main offers on hardware, components and other electronics that we found online. Video card, motherboard, RAM memory and everything you need to build your PC. By joining our group, you receive daily promotions and have early access to discount coupons.
Join the group and take advantage of promotions
Source: https://www.adrenaline.com.br/nvidia/nvidia-rtx-5090-supera-amd-e-apple-rodando-modelos-locais-de-linguagem-da-openai/
