Running LLMs locally vs using APIs — what I learned

March 23, 2026

Most of my early experience with LLMs came from APIs.

You send a prompt, get a response, and everything feels instant. There’s no need to think about how the model is running, what resources it’s using, or what’s happening behind the scenes.

For a while, I assumed that was the whole picture.

What I thought before

I treated LLMs like simple tools:

send input
get output
move on

The hard part seemed to be prompting, not the system behind it.

I didn’t think about:

model size
memory usage
hardware constraints
latency

Everything just worked.

What changed

Running a model locally changed that completely.

Instead of calling an API, I had to actually run the model myself.

That meant dealing with:

model loading time
CPU and RAM limits
slow inference
performance trade-offs

It became obvious that an LLM is not “magic” — it’s just a heavy process running on a machine.

APIs hide that.

Local setups expose it.

Local vs API

The difference is not just technical — it’s about how you think.

APIs

easy to use
fast
no setup
scales automatically

But:

paid usage
no control over the model
everything is a black box

Local models

full control
no per-request cost
can run offline
customizable

But:

slower (especially on CPU)
dependent on hardware
more setup and tuning required

The real difference

APIs optimize for convenience.

Local models optimize for control and understanding.

What was actually hard

The challenge wasn’t writing code.

It was understanding constraints.

choosing the right model size
dealing with slow responses
managing memory and performance
structuring a clean API layer
debugging unexpected outputs

These are not problems you see when using APIs.

What I learned

A few things became clear very quickly:

LLMs are not just “AI tools” — they are infrastructure problems
Performance trade-offs are real (model size vs latency)
Running models locally forces you to think like a system builder

The biggest shift was this:

I stopped thinking like a user and started thinking like someone running the system.

Closing

APIs made me comfortable with AI.

Running models locally made me understand it.