Three ways to run AI. One wins per task.
The hardware in my office can run a 120-billion-parameter model. A $20/month subscription gives me Claude Opus in a browser. An API key lets me embed any model inside my own tools. They're not competing — they're three different answers to three different questions. Pick wrong and you overpay, leak data, or hit a capability wall. Two live demos of the API route are right below; the deeper discussion of when each approach wins is further down.
Talk to Claude Haiku 4.5, live.
This chat goes through claude-proxy.php on this server, which holds the API key and counts tokens per visitor. A few exchanges are open to everyone. After that, you'll be asked for a password — the demo budget is shared and I'd rather not have it eaten by one curious visitor. The assistant is loaded with my professional background and a map of this site, so ask it anything — where I've worked, what each page shows, how the solar simulation was built. It'll answer directly.
Hi — I'm Razvan's portfolio assistant, running on Claude Haiku 4.5. I'm tuned to answer any question about Razvan Gheorghies — his work, his projects, his background, what's on this site — honestly and directly, no hedging. If I don't know something, I'll say so. Just ask.
Two models, same prompt.
Image generation doesn't need a proxy or a gate — Pollinations.ai hosts image models behind no-key URL endpoints. Type a prompt below and the same text goes to FLUX Schnell (Black Forest Labs, fast and stylistic) and Z-Image (Pollinations' native model, different aesthetic). Two outputs in parallel lets you see how much the underlying model matters for the same words. Why only two? Pollinations offers more models, but the others hit IP-level rate limits when fired in bursts — rather than show a grid where four slots fail every time, the demo sticks to what works reliably.
Why three ways, and when each wins.
The demos above are the one that ships software — API AI. But there are two other routes: running models on your own hardware, and paying a subscription for frontier web access. Each wins different tasks. Click any of the four below to read how I think about the tradeoff and what's in my actual local stack.
The discipline ▾
Every AI decision is a tradeoff triangle.
The three axes are privacy, cost, and capability. No single way of running AI wins on all three. Local models give absolute privacy and zero per-query cost, but the 120-billion-parameter ceiling they can hit locally still loses to Claude Opus on hard reasoning. Web subscriptions deliver frontier capability for a flat fee, but every keystroke goes to a third-party log. API keys give you frontier capability and the ability to build it into your own products, but you're billed per token and your data flows through the vendor's pipes.
The discipline isn't picking one — it's knowing which to use when. Regulated document review? Local. Quick reasoning lookups during work? Web subscription. A field tool that parses service reports and runs on a technician's laptop? API, because the output has to flow through code. The rest of this page is my actual stack, model-by-model, with a live demo at the end.
Local AI ▾
The models on my own hardware.
LM Studio running on a Ryzen 9 9950X3D / 96 GB DDR5 / RTX 5090 desktop. Nine models across four architecture families, ranging from 3B to 120B parameters. Everything below runs offline, for free, with zero data leaving the machine. The tradeoff is speed — the 120B model is slower than Claude Haiku — and capability ceiling — Opus-class reasoning is still out of reach at home. But for sensitive work, for experimentation, and for the category of "this must never hit a cloud API", local is the right answer.
Web AI ▾
Paying for a chat box, not for code.
This is ChatGPT, Claude.ai, Gemini web, Grok on x.com. You pay a flat monthly fee and get the vendor's best model in a browser. No API key, no code, no integration. What you get is the best reasoning available anywhere for $20 a month — frontier models behind a text box. What you don't get is the ability to build anything on top of it.
Use it for
- Daily thinking partner, research, writing
- Debugging hard problems where capability > privacy
- One-off tasks that don't justify API setup
- Exploring what a frontier model can actually do
Don't use it for
- Regulated data · PII · client IP · health records
- Anything that needs to run inside your own software
- Automation — there's nothing to call programmatically
- Workflows where repeatable output matters (prompts drift)
API AI ▾
When the model has to live inside your code.
An API key lets you call a frontier model from your own software. Tools like my DALUM commissioning report generator, the document extractor on this site, the field-tech chat assistant — none of those work with a web subscription, because they need to run inside code I wrote. You pay per token instead of per month, which is cheaper if you use it rarely and more expensive if you hammer it. The vendor still sees the data, but only the specific tokens your code sends — no browsing history, no account, just a keyed request.
The demo in §05 below runs through this route. Your questions go to api.anthropic.com via a small PHP proxy on this server. The proxy holds the key so it isn't visible in the page, rate-limits per visitor, and prompts for a password after a few exchanges so the demo budget doesn't evaporate on anyone who wants to chat for an hour.
Use it for
- Tools and products — embed the model in software
- Batch jobs, pipelines, scheduled automation
- Any task where output must flow through code
- Apps where end-users don't have their own accounts
Don't use it for
- Daily personal chat — web subscription is cheaper
- Regulated data that must not leave your infrastructure
- Prototyping a single prompt — the web UI is faster
- Billing surprises — always set monthly budget caps