← Writing

Running LLMs locally vs using APIs — what I learned

Most of my early experience with LLMs came from APIs.

You send a prompt, get a response, and everything feels instant. There’s no need to think about how the model is running, what resources it’s using, or what’s happening behind the scenes.

For a while, I assumed that was the whole picture.


What I thought before

I treated LLMs like simple tools:

  • send input
  • get output
  • move on

The hard part seemed to be prompting, not the system behind it.

I didn’t think about:

  • model size
  • memory usage
  • hardware constraints
  • latency

Everything just worked.


What changed

Running a model locally changed that completely.

Instead of calling an API, I had to actually run the model myself.

That meant dealing with:

  • model loading time
  • CPU and RAM limits
  • slow inference
  • performance trade-offs

It became obvious that an LLM is not “magic” — it’s just a heavy process running on a machine.

APIs hide that.

Local setups expose it.


Local vs API

The difference is not just technical — it’s about how you think.

APIs

  • easy to use
  • fast
  • no setup
  • scales automatically

But:

  • paid usage
  • no control over the model
  • everything is a black box

Local models

  • full control
  • no per-request cost
  • can run offline
  • customizable

But:

  • slower (especially on CPU)
  • dependent on hardware
  • more setup and tuning required

The real difference

APIs optimize for convenience.

Local models optimize for control and understanding.


What was actually hard

The challenge wasn’t writing code.

It was understanding constraints.

  • choosing the right model size
  • dealing with slow responses
  • managing memory and performance
  • structuring a clean API layer
  • debugging unexpected outputs

These are not problems you see when using APIs.


What I learned

A few things became clear very quickly:

  • LLMs are not just “AI tools” — they are infrastructure problems
  • Performance trade-offs are real (model size vs latency)
  • Running models locally forces you to think like a system builder

The biggest shift was this:

I stopped thinking like a user and started thinking like someone running the system.


Closing

APIs made me comfortable with AI.

Running models locally made me understand it.