Ollama API Quickstart: How to Run a Local Model and Call It From Python
A practical Ollama guide showing how to start the local server, pull a model, call the HTTP API, and use a local LLM from Python without overcomplicating the stack.
Why Ollama gets popular so fast: it gives developers a much shorter path from “I want to try a local model” to “I have a local API endpoint.” That speed is useful, but the value comes from using it in a disciplined way instead of turning local LLM work into random shell experiments.
What Ollama gives you
Ollama helps you download and run local models with a simpler command-line workflow than many lower-level inference setups. It also exposes a local HTTP API, which is what makes it useful beyond toy terminal chats.
That means you can:
- run a model locally
- call it from scripts
- plug it into prototypes
- experiment without immediately depending on a hosted API
Step 1: install Ollama
On macOS, install it from the official site or app bundle. After installation, verify:
ollama --versionIf the command is missing, fix PATH or the installation before doing anything else.
Step 2: pull a model
Example:
ollama pull llama3.1This downloads the model so it is available locally.
Step 3: run a quick local chat in the terminal
ollama run llama3.1That confirms the local runtime works at the most basic level.
Step 4: use the HTTP API
Ollama documents a local API. A simple request with curl looks like this:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Explain what a Docker healthcheck does in one paragraph.",
"stream": false
}'That is the point where Ollama becomes more than a CLI novelty.
Python example
import requests
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "llama3.1",
"prompt": "Write a Python function that retries a request three times.",
"stream": False,
},
timeout=120,
)
response.raise_for_status()
data = response.json()
print(data["response"])That is enough to start wiring a local model into internal tools or experiments.
When the chat endpoint is a better fit
Some teams begin with /api/generate, but for multi-turn workflows the chat-style API is often easier to reason about because the message history is explicit.
import requests
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "llama3.1",
"messages": [
{"role": "system", "content": "You explain infrastructure clearly."},
{"role": "user", "content": "What does a reverse proxy do?"},
],
"stream": False,
},
timeout=120,
)
print(response.json()["message"]["content"])That shape tends to age better once your prototype becomes a real assistant or internal tool.
Why developers still get tripped up
They forget local models are still infrastructure
Local does not mean free of constraints. CPU vs GPU, RAM, model size, and latency still matter.
They assume local privacy means zero design work
Running the model locally is only one part. You still need to think about logging, prompt handling, timeouts, and error behavior.
They keep switching models without defining the job
A local model experiment becomes much more useful when the task is concrete:
- summarize logs
- rewrite docs
- classify support tickets
- draft code comments
When Ollama is the right tool
It is excellent when you want:
- fast local experimentation
- an easy API surface
- lower-friction demos for internal tooling
- local testing before deciding on a heavier stack
Common first-week mistakes
The most common mistake is downloading a model that is too heavy for the machine and then assuming the whole local-LLM idea is bad. Another is treating every model swap like progress instead of first defining the task. A better workflow is to pick one narrow job, measure response quality and latency, and only then decide whether you need a larger model.
Final recommendation
Do not judge Ollama by the first funny chatbot output it gives you. Judge it by whether the local API helps you prototype a real developer workflow faster.
That is the real reason to use it. Not local-model theater. Faster iteration with more control.