Close Menu
TechUpdateAlert

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why

    December 22, 2025

    You can now buy the OnePlus 15 in the US and score free earbuds if you hurry

    December 22, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455

    December 22, 2025
    Facebook X (Twitter) Instagram
    Trending
    • My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why
    • You can now buy the OnePlus 15 in the US and score free earbuds if you hurry
    • Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455
    • Android might finally stop making you tap twice for Wi-Fi
    • Today’s NYT Mini Crossword Answers for Dec. 22
    • Waymo’s robotaxis didn’t know what to do when a city’s traffic lights failed
    • Today’s NYT Wordle Hints, Answer and Help for Dec. 22 #1647
    • You Asked: OLED Sunlight, VHS on 4K TVs, and HDMI Control Issues
    Facebook X (Twitter) Instagram Pinterest Vimeo
    TechUpdateAlertTechUpdateAlert
    • Home
    • Gaming
    • Laptops
    • Mobile
    • Software
    • Reviews
    • AI & Tech
    • Gadgets
    • How-To
    TechUpdateAlert
    Home»Laptops»Why VRAM matters most for running Ollama on Windows PC
    Laptops

    Why VRAM matters most for running Ollama on Windows PC

    techupdateadminBy techupdateadminAugust 25, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Why VRAM matters most for running Ollama on Windows PC
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Ollama is one of the easiest ways you can experiment with LLMs for local AI tasks on your own PC. But it does require a dedicated GPU.

    However, this is where what you use will differ a little from gaming. For example, you may actually have a better time with local AI using an older RTX 3090 than, say, a newer RTX 5080.

    For gaming purposes, the newer card will be the better option. But for AI workloads, the old workhorse has one clear edge. It has more memory.


    You may like

    VRAM is king if you want to run LLMs on your PC

    The RTX 5090 is extremely expensive and overkill for most gamers, but with 32GB VRAM it’s got a lot in the tank for AI workloads. (Image credit: Windows Central | Ben Wilson)

    While the compute power of the later generation GPUs may well be better, if you don’t have the VRAM to back it up, it’s wasted on local AI.

    An RTX 5060 will be a better GPU for gaming than the old RTX 3060, but with 8GB VRAM on the newer one versus 12GB on the older one, it’s less useful for AI.

    When you’re using Ollama, you want to be able to load the LLM entirely into that pool of deliciously fast VRAM to get the best performance. If it doesn’t fit, it’ll spill out into your system memory, and the CPU will start to take up some of the load. When that happens, performance starts to dive.

    The exact same principle is true also if you’re using LM Studio instead of Ollama, even if you’re using that with an integrated GPU. You want to have as much memory reserved for the GPU as possible to load up the LLM and keep the CPU from having to kick in.

    All the latest news, reviews, and guides for Windows and Xbox diehards.

    GPU + GPU memory is the key recipe for maximum performance.

    The easiest way to get an idea as to how much VRAM you need is to just see how physically large the model is. On Ollama, gpt-oss:20b, for example, is 13GB installed. You want to have enough memory available to load all of that as a bare minimum.

    Ideally, you want to have a buffer, too, as larger tasks and the context windows to fit them will need a buffer to load into as well (the KV cache). Many recommend multiplying the size of the model by 1.2 to get a better ballpark figure as to the VRAM you want to be shooting for.

    This same principle applies if you’re reserving memory for the iGPU to use in LM Studio, too.

    Without enough VRAM, performance will tank

    Deepseek-r1 performance in Ollama using an RTX 5080

    The ideal scenario is 100% GPU usage and a good tokens per second count. (Image credit: Windows Central)

    Time for some examples! This is an extremely simple test, but it illustrates the point I’m trying to make. I ran a simple “tell me a short story” prompt on a range of models using my RTX 5080. In each case, all of the models can be loaded entirely into the 16GB VRAM available.

    The rest of the system contains an Intel Core i7-14700k and 32GB DDR5 6600 RAM.

    However, I increased the context window on each model to force Ollama to have to spread over to the system RAM and CPU as well, to show just how far the performance drops off. So, to the numbers.

    • Deepseek-r1 14b (9GB): With a context window of up to 16k GPU usage is 100% at around 70 tokens per second. At 32k there is a split of 21% CPU to 79% GPU, with performance dropping to 19.2 tokens per second.
    • gpt-oss 20b (13GB): With a context window of up to 8k GPU usage is 100% at around 128 tokens per second. At 16k there is a split of 7% CPU to 93% GPU with performance dropping to 50.5 tokens per second.
    • Gemma 3 12b (8.1GB): With a context window of up to 32k GPU usage is 100% at around 71 tokens per second. At 32k there is a split of 16% CPU to 84% GPU with performance dropping to 39 tokens per second.
    • Llama 3.2 Vision (7.8GB): With a context window of up to 16k GPU usage is 100% at around 120 tokens per second. At 32k there is a split of 29% CPU and 71% GPU with performance dropping to 68 tokens per second.

    I’ll be the first to say this is not the most in-depth scientific testing of these models. It’s merely to illustrate a point. When your GPU cannot do all of the work itself and the rest of your PC comes into play, the performance of your LLMs will drop quite dramatically.


    This is just a very quick test designed to illustrate the importance of having enough available GPU memory to allow your LLMs to perform at their best. You don’t want to have your CPU and RAM picking up the slack wherever possible.

    The performance here was still decent when that happened, at least. But that’s with some solid hardware to back up the GPU.

    The quick recommendation would be to have 16GB VRAM at least if you want to run models up to gpt-oss:20b. 24GB is better if you need space for more intensive workloads, which you could even get with a pair of well-priced RTX 3060s. You just need to work out what’s best for you.

    Matters Ollama running VRAM Windows
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleLight-based packet switching breakthrough promises future-proof scaling, and massive energy savings in next-generation data centers
    Next Article Hori Piranha Plant Camera for Nintendo Switch 2 review: this webcam has a charming design, but its picture quality lacks bite
    techupdateadmin
    • Website

    Related Posts

    Gadgets

    Your next Legion Go 2 might run SteamOS instead of Windows 11

    December 21, 2025
    Mobile

    Apple TV App on Android Now Supports Google Cast. Here’s Why That Matters

    December 16, 2025
    Gadgets

    Microsoft makes theming your Windows 11 PC as easy as phones, but not as much fun

    December 15, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    NYT Strands hints and answers for Monday, August 11 (game #526)

    August 11, 202545 Views

    These 2 Cities Are Pushing Back on Data Centers. Here’s What They’re Worried About

    September 13, 202542 Views

    Today’s NYT Connections: Sports Edition Hints, Answers for Sept. 4 #346

    September 4, 202540 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Best Fitbit fitness trackers and watches in 2025

    July 9, 20250 Views

    There are still 200+ Prime Day 2025 deals you can get

    July 9, 20250 Views

    The best earbuds we’ve tested for 2025

    July 9, 20250 Views
    Our Picks

    My Health Anxiety Means I Won’t Use Apple’s or Samsung’s Smartwatches. Here’s Why

    December 22, 2025

    You can now buy the OnePlus 15 in the US and score free earbuds if you hurry

    December 22, 2025

    Today’s NYT Connections: Sports Edition Hints, Answers for Dec. 22 #455

    December 22, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact us
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    © 2026 techupdatealert. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.