← All writing

Origin

How this started

Summer 2025. I’d been using Claude Code daily for about five months, building things faster than I ever thought possible. I can’t remember exactly what triggered it — maybe boredom, maybe a YouTube video about running Llama 70B locally — a thought kept creeping in, what if I owned the model instead of renting it?

I pulled the trigger and bought my first RTX 6000 Pro. The system itself was an Intel consumer flagship build — 128GB of RAM, an Asus ProArt motherboard with two x16 slots. The parts arrived, I assembled them, and I plugged in the card.

I got Llama 70B running, played with it for a while, and then… didn’t really use it. I read a lot, experimented with smaller projects, but no real use case clicked. It sat there.

What followed was short period of quiet, buyers remorse, regret. I had dropped €9,000 on what was, viewed cynically, just a gamer card with extra memory. I knew the memory was the entire point — that it was the key to unlocking the AI world — but knowing that didn’t lessen the sting of the receipt sitting in my inbox.

Regardless, it didn’t stop me. I kept going and kept tinkering.

Things shifted more at the end of 2025. Everything I’d touched this far culminated with a more serious project: an agentic, multi-turn RAG system that indexed 100GB of Estonian legal text with embeddings. Agents could query and cross-reference the corpus, hit public services like the property registry and court records, and synthesize answers to real legal questions. It worked. Building and serving that system finally put my single GPU to real work, and taught me how the whole stack fit together.

But there was a problem. Every interaction cost real money. The heavy lifting relied on the Anthropic API, and I quickly burned through €2,000 just on testing and debugging.

That project was the wake-up call. It showed me exactly what these systems can do when you build them properly — and exactly what they cost when you don’t own the infrastructure. Suddenly, the decision wasn’t difficult anymore.

In December 2025, I said eff it and bought my second RTX 6000 Pro. The goal was to localize the entire build of my project and stop bleeding cash on API calls.

Limited understanding, limited capability to imagine what’s possible — the only way to find out is to arrive.

Today is June 1st, 2026. I’m now running four RTX 6000 Pro cards — 384GB of Blackwell VRAM in one box. The early regret? Poof, long gone. If I had the means, I’d build a second identical system and cluster them together.