I tried agentic coding with a local LLM... it didn't go entirely well

posted by Jeff | Thursday, May 7, 2026, 8:49 PM | comments: 0

My opinion about the AI bubble, and there's definitely a bubble, is that it pops when people can run open source models locally, and they're awesome. For general AI stuff that isn't coding, the stuff that consumers do, we're already there. When the developer use case gets there, it'll be nuts. We're close.

About a year and a half ago, I bought a gaming PC from Lenovo. It's the first one I didn't build myself, and it's pretty great. It has an RTX 4080 Super in it, with 16 gigs of VRAM. Boy can it sling polygons. I got a stupid good deal, and it's fun to play The Last of Us. It also could, presumably, run a decent model.

So I tried the usual chat stuff with small models, and was surprised at how decent it was. I even asked it a programming question not specific to my code, and it gave me a correct answer quickly. Neat. Getting models to work right with Claude Code was janky, but I figured it out. It seems like the gold standard is qwen3.6, the 27B parameter model. I figured it would be too large, and it was. But I did let it churn (for an hour) on a task, and it was every bit as solid as the results I'd get from Sonnet.

Smaller models performed fairly well, but getting the plumbing to work right between Claude and the model is not straight forward. When I could get one to work, the results were just OK, and a little slow. Tasks with larger scope either failed to complete or neglected conventions that I'd expect it to use. I'd rather shell out the money than have to deal with that.

But I can see a legit future. If someone can make a model that works well on an Apple silicon MacBook Pro with 16 gigs of RAM, it's bad news for the companies spending gratuitously on data centers. And honestly, I think that's a better future.

I tried agentic coding with a local LLM... it didn't go entirely well

Comments

Post your comment: