Agent Framework
Ollama API
Build your own AI-powered applications
Ollama exposes a local REST API that mirrors OpenAI's API format. If you have an application that currently calls OpenAI's API, you can point it at your local Ollama instance instead — often with just a URL change. Build custom integrations, run AI in your own scripts, or connect any tool that supports an OpenAI-compatible endpoint.
Book a consultation to set up the Ollama API →What it is
OpenAI-compatible. Local hardware.
Ollama runs a REST API on your Mac that speaks the same language as OpenAI's API. Applications built to call api.openai.com can often be redirected to your local instance by changing one line — the base URL.
This makes local AI accessible to developers and technical users who already have tools that call OpenAI, as well as anyone building new applications who wants AI inference that never touches the cloud.
The Ollama API is installed in every setup. This configuration tier focuses on custom integration work: helping you connect your specific application, script, or tool to your local AI instance.
How it works
Local endpoint, familiar interface
Ollama listens on localhost:11434 by default and exposes endpoints for chat completions, completions, and embeddings — the same endpoints OpenAI uses. Your application sends a request; Ollama routes it to the appropriate model and returns the response.
External access is handled via Tailscale, allowing your applications on other devices to reach the API securely without opening a public port. We configure authentication to prevent unauthorized access.
Who it's for
Developers and technical users building on top of AI
- ✓Developers building applications that use OpenAI's API and want to self-host
- ✓Technical users running AI in their own scripts and automation
- ✓Teams with existing OpenAI integrations they want to move to local hardware
- ✓Anyone who wants programmatic AI access without per-token costs
- ✓Developers prototyping AI features without sending data to the cloud
Note
The Ollama API is installed and available in every setup. If you're choosing this tier specifically, it means you want help with the integration work — connecting your application or scripts to your local instance and configuring it for your use case.
Full stack
What gets installed
| Layer | Component | Purpose |
|---|---|---|
| AI Engine | Ollama (MLX backend) | Runs models on Apple Silicon with REST API |
| API Interface | OpenAI-compatible endpoint | Drop-in replacement for OpenAI API calls |
| Chat UI | Open WebUI | Browser chat, always available |
| Networking | Tailscale | Secure API access from anywhere |
| Security | Hardened config | Loopback binding, optional API key auth |
| Integration | Custom configuration | Configured for your specific integration target |
Security
Loopback by default, Tailscale for remote
Ollama binds to loopback by default — the API is not accessible from the network without explicit configuration. Remote access is handled through Tailscale's encrypted tunnel. We configure API key authentication for any external access and limit the API to specific model versions to prevent unauthorized model loading.
Recommended models
Models that pair well
GPT-4 equivalent — use as a drop-in replacement for OpenAI's most capable models
Versatile and strong — works well across a wide range of integration use cases
Code-focused — ideal if your integration is a coding tool or developer workflow
Ready to build with local AI?
Book a consultation. We'll configure the Ollama API on your Mac and help you connect your application or integration.
Book a consultation to set up the Ollama API →