Transcripts

Hands-On Windows 189 transcript

May 14th 2026

Please be advised that this transcript is AI-generated and may not be word-for-word. Time codes refer to the approximate times in the ad-free version of the show.

Paul Thurrott [00:00:00]:
Coming up next on Hands on Windows, we're going to take a look at a topic I've wanted to cover for quite some time. Local AI podcasts you love from people you trust. This is Twit. Hello, everybody, and welcome back to Hands on Windows. I'm Paul Farat and this week we're going to take a look at local AI. This is running smaller AI models on your computer instead of bigger AI models up in the cloud, which is the more typical situation. So if you are familiar with something like cloud here, anthropic cloud, which I have running in a browser, in this case ChatGPT, Copilot, etc. Those things are typically running in the cloud and they're very big AI models.

Paul Thurrott [00:00:50]:
They're expensive to run, they're incredibly powerful. But there's another type of AI model, small language models that can run locally on your device. There are models that are small enough to run on a phone, models that can run on computers, et cetera. And I've looked at this again and again and again. Obviously there's a big gap between big LLMs in the cloud and small SLMs on your device, but the gap is shrinking as the quality goes up in both cases. And there are some big advantages to running AI locally. The most obvious. Well, I guess maybe it's not the most obvious one, but one of them is that it can work offline.

Paul Thurrott [00:01:30]:
It doesn't cost anything. Right. Depending on the capabilities of your computer, you might be running against the cpu, which can be resource intensive. You might be running against the gpu, which could be fantastic if you have a good one. You could be running against the NPU if you have a Copilot plus PC or an AI PC. And that can also work very well. And, you know, there's some complexity to it, but it's actually very, very interesting. So there are different applications you can use to do this.

Paul Thurrott [00:01:58]:
Some of them are very technical. For this I'm going to use something called LM Studio, but this is one of probably hundreds of choices. This is cross platform. You can get this on the Mac, I think it's on Linux as well. And when you first run it, it asks you if you want to download the model it recommends, which is actually what I did download, but it's the one I wanted to download anyway, which is this Gemma 4e4b. And yes, that's kind of a strange name, but Gemma is the SLM or local AI version of Gemini, the big Google models up on the cloud. It comes in different versions. This one, if you look at the name of the thing, it's the fourth generation of this model family, E4B.

Paul Thurrott [00:02:43]:
Refer the 4B. Well, the E is efficient and there are not. There are non efficient versions as well. Meaning the alternative is active and it has to do with how many parameters, which I'll get into in a moment. Sorry, there's a lot of terminology here. Can be running at the same time and then the 4B is 4 billion and that's for parameters. It's a sort of a waiting system for the number of internal variables that the model can consider as it's doing its work. It's kind of like synapses in your brain or whatever.

Paul Thurrott [00:03:18]:
So I've already loaded this, I believe. Yes. Because this eject button is how I would get rid of it. I can look over here at the models on. I have installed this computer. I just have the one I should say. Gemma is a multi model, sorry, multimodal model, meaning it can work with different inputs, including images and text. It only outputs text.

Paul Thurrott [00:03:40]:
There are local models that can output images and other types of content. I've been experimenting with those as well. But I think for this first look at this, I'm going to stick to text. Let's keep it simple and we'll look at images and other things later. You can also go in and just see what they have or models. The one thing I'm not 100% sure of on this particular application is whether you can search for models that specifically run across the GPU or the mpu. But I know, I know in this case, I assume it says it somewhere here. I don't see it.

Paul Thurrott [00:04:14]:
But this can run at least partially against your gpu. Some of these models will say, literally it can run 100 against the GPU and that's actually a very efficient, in terms of time way to do that, meaning it will be faster and it can run off the gpu. But it's not. I'm actually not clear what where this is running against, but it's going to be cpu, GPU or MPU or some combination of those things. So when we bring this thing up, it looks like, you know, again, we can go back to cloud here. It's very similar, right. You've got this kind of bar over the side with different options. You've got your previous conversations, you can start a new conversation like this.

Paul Thurrott [00:04:53]:
It's set up to do thinking by default. This is reasoning. This is kind of an interesting thing because this is the type of AI model where you can see its internal Reasoning, if you will, as it tries to build the answer to whatever it is you've asked it. Some people don't like that. To me, it feels a little bit like stalling for time, because it does take some time. You know, these models are slower and they're, they're not as powerful, obviously, as the things in the cloud, but there's still some really neat things you can do there. So let me go through some simple prompts and some things I've tried in the past a little bit so I can kind of understand, you know, I have some idea of what's going to happen and I'll show you how this thing works. But by and large, you know, this should be very familiar.

Paul Thurrott [00:05:37]:
This is a lot like Copilot Chat, GPT, cloud, whatever. So the first one, it's, you know, I'll just type this. Can you tell me how many people used each major version of Windows? I'll. I'll eliminate the suspense and tell you it cannot tell me this, but we'll see what it does of Windows over time. And this is where the thinking starts. So you can see it. This is it sort of talking to itself. Right? You can click on this to expand it so you can see more of it.

Paul Thurrott [00:06:14]:
And as it says here, it's difficult because we really don't have good data for a lot of this. It's not clear how many hundreds of thousands of people use, say, Windows 2.0 or whatever. Microsoft at some points in time has, you know, provided data for the number of people using windows, etc. But sometimes not so much. So this is thinking through the problem, so to speak. I can't see. Yeah, if you look at like something like it says, focus on the revolutionary nature, not massive numbers. It's not really talking to me there, it's talking to itself.

Paul Thurrott [00:06:47]:
That makes sense. But now you can see this part out here. Is it supplying the answer? The other thing, as it does, is to look at is this little number and circle down here. If you hover over this, it tells you that this is the number of tokens that this thing is expending. And there are tokens that kind of go in and there's input tokens and then output tokens. The way to think of this is that a token is a unit of work essentially. It's kind of the currency of AI, input and output, literally in the cloud. It's literally the currency because that's how you get charged now for using AI.

Paul Thurrott [00:07:26]:
And then there's this. Let me hover over that again. This notion of. Well, it actually doesn't say context here. Oh no. It does, yeah. Total load of context 4096. So this particular model has a context window.

Paul Thurrott [00:07:40]:
Another great terminology of 256,000 tokens. And that's what the AI can essentially remember in a given conversation if this thing goes over 100%, and it won't in this version or in this query, but it will actually just forget the conversation that goes past that number. Right. And so if you're within 100%, this is. Most of the things I'm going to do today are about 50% of the context window. What that means is if you keep asking it questions, it will remember based on what it already did, but once it exceeds a hundred percent, it will start forgetting kind of a flowers for Algernon kind of a thing there. Okay. So it has provided this information.

Paul Thurrott [00:08:24]:
It's nicely formatted. It doesn't literally, I don't think anywhere say X number of million or billion people used in particular version of Windows. But I do know for the recent versions like 10 11, Microsoft has said explicitly over a billion in each case. And they don't really have that because I already knew they weren't going to do that. But still very interesting. And you could dive in further here if you wanted to. I'm not going to bother with that. This box up here is the developer interface and that's an optional feature you can enable when you install the app.

Paul Thurrott [00:08:58]:
I did enable that. You don't need that to ask IT developer questions, but instead of an actual code question, in other words, here's some software code, tell me what's wrong with it or how could I make this more efficient, etc. I'll just ask it a coding question. So, you know, can you tell me what a singleton is in programming and whether it's. Well, I will go with that. I'm going to say whether it's good or bad to use it. Um, generally speaking it's bad to use it, but we'll watch it kind of think through this. So singleton, if you're not a programmer or are a programmer sort of like a global variable, static variable, depending on language, it kind of breaks the whole object oriented programming paradigm, but they're super convenient to you, so people use them all the time.

Paul Thurrott [00:09:50]:
Developers do, I should say. But again, this is it thinking through the problem. It's going back and forth with itself, essentially with the data it has access to, and now it is providing the answer. So yes, it is often debated in object oriented programming. It's explaining what it is, etc. So you get the idea. This past weekend we were away and I was experimenting with different models and different apps and, and I was trying to think I want to do something complex but not technical, you know. And if you watch Twinders Weekly, you'll know that JRR Tolkien has come up a couple times recently for whatever reason.

Paul Thurrott [00:10:28]:
And I jokingly often refer to myself as a Tolkien scholar. And so I thought, I'll just stop that. In fact, what I should be doing here is a new conversation for each of these things was I'll ask it about Tolkien. You know, like, I know a lot about Tolkien. I've read those books dozens and dozens of times. There was a period of time where I read the Lord of the Rings every single year, you know, for example. So I'm going to pretend I don't know anything about it. I want to read Tolkien.

Paul Thurrott [00:10:56]:
Where should I start? And are some of the books better than others? The lack of a better way to frame this. So once again, I'll just expand so you can see it's kind of thinking through this. If you're familiar with Tolkien at all, you'll know that the Hobbit is the obvious place to start. It's super, you know, it's a. It's almost a children's book. It's very short and easy to digest. The Lord of the Rings, of course, is the most famous one trilogy of books that everyone has heard of and probably seen the movies. And then there's the Silmarillion and all the background works that have occurred since his death or have been published since his death, I should say.

Paul Thurrott [00:11:37]:
Silmarillion is like reading the Bible, which meaning it's not a good read. It's more of a kind of a mythology history work or whatever. So let's see what he says here. Or I've already turned it into a human being for some reason. But yes, so, so where you should start, obviously the Hobbit, you know, lighter, quicker, etc, Lord of the Rings. I'm. I assume the Silmarilan will be third or whatever. Doesn't even get into.

Paul Thurrott [00:12:03]:
Oh, he's going through all those books. Yep. That's interesting. Doesn't even mention that. That's fine, we'll just move along from that. But when I was doing this over the weekend, I kept going with it, you know, well, what about this? What are the major themes? What are the, you know what, what's the point of these stories, et cetera. And it's fascinating to watch the local AI especially because something like this, it can quickly blow through its context window and it just forgets everything it ever said, you know, to you. I kind of want to see where it's at.

Paul Thurrott [00:12:36]:
I'm going to guess this is going to be a little over 50% just for this little report style answer it gave. Yeah. So this thing will eventually catch up. It's not because it's clearly not 1%, but maybe it is 1%. Oh, there we go. 44%. Okay, you can do other things with this. Obviously you could upload a document.

Paul Thurrott [00:12:59]:
This has not worked for me on this computer, so I don't want to embarrass it or me. But you can see the various qualifications here. But when I was doing this on a more powerful computer, I originally tried to upload a PDF. I ran into size issues. I uploaded the smallest PDF I had for one of my books. It still was not happy with that. But I could go into my book folder here where I have the source files for those books. So, you know, an individual chapter which is a markdown file for the book should work.

Paul Thurrott [00:13:33]:
It's not going to. It's going to fail for some kind of a node JS kind of a problem. But I'll just ask it summary. Yeah, so normally what would happen is it would provide that kind of report like you saw before. I don't understand why this is failing on this particular computer. So I don't have node JS or whatever, but I feel like that's something that maybe another app would have handled a little better. Similarly, you could use this to analyze an image. Again, I can't output an image if I.

Paul Thurrott [00:14:02]:
In fact I could just say can you create an image? And this could be the same error. That's interesting. Normally what it would do is just come up with an error message and say this is the type of model I am, I can't do that type of thing. And you could go into here and then maybe search for a model that can do images. And interestingly this one does come up, but that's because it can process images in that direction. So you want to find one that can work for that and output images. But again that's something we will look at later. So.

Paul Thurrott [00:14:35]:
So this to me is incredibly useful. You might still want to turn to cloud based AI for many things, but I do feel like this is a good place to start this or another app and, and. Or another model. Right. Microsoft OpenAI rather. Sorry I was going to say chat, GPT etc these companies all have Local models, deep seat, many, many of them. And then there's the cloud hosted stuff, which you know is free and paid, right? Even if you pay for something like cloud or chatgpt or Copilot or whatever, it almost always makes sense just to start here, especially if you're looking for help, writing or want the answer to some complicated question, etc. And, and if you kind of go past the capabilities of this thing, you could then move on to the paid or bigger cloud based AI.

Paul Thurrott [00:15:27]:
Plus if you're on a plane or you're just offline, it's kind of nice to be able to go back and forth and have these conversations because, you know, within the context of any of these things, literally, and also in the language of AI, you know, you can see in this case it had used 63%. You could keep going to a point until you get to where you want to be. And then if you exceed the context, you might have to give it some information that it provided previously, which is kind of curious, but to keep going. But you can keep going. And this is free, like literally free. So depending on what you're doing, I think this is pretty much just as good as most of the more famous, you know, cloud based. So it's something, something to look at. And we will look at this again in the future.

Paul Thurrott [00:16:12]:
I want to do an episode at some point where we use local AI to create images and other content, you know, things like videos and so forth. So we will get there, but this is the first step and then, you know, we'll take it from there. So hopefully you found this useful. We will have a new episode of Hands and Hands on Windows every Thursday. You can find out more about the show at TWIT tv. H O W thank you so much for watching. Thank you especially to our club Twitter members. Love you.

Hands-On Windows #189
May 14 2026 - Running Local AI on Windows 11
A First Look at Running AI Directl…

All Transcripts posts