Ollama
Installation
Section titled “Installation”Install Vitamem and the OpenAI peer dependency. The Ollama adapter is a thin wrapper over the OpenAI adapter that points at Ollama’s OpenAI-compatible API.
npm install vitamem openaiThen install Ollama itself from ollama.com.
Quick Setup
Section titled “Quick Setup”If Ollama is running locally with default settings, zero configuration is needed:
import { createVitamem } from "vitamem";
const mem = await createVitamem({ provider: "ollama", storage: "ephemeral",});No API key required. This uses the default models: llama3.2 for chat and extraction, nomic-embed-text for embeddings, connecting to http://localhost:11434/v1.
Pull the Models
Section titled “Pull the Models”Before using Vitamem with Ollama, pull the default models:
ollama pull llama3.2ollama pull nomic-embed-textOllama must be running when your application starts. Verify it is available:
ollama listAdapter Factory
Section titled “Adapter Factory”For full control, use createOllamaAdapter:
import { createOllamaAdapter, createVitamem } from "vitamem";
const llm = createOllamaAdapter({ chatModel: "llama3.2", embeddingModel: "nomic-embed-text", baseUrl: "http://localhost:11434/v1",});
const mem = await createVitamem({ llm, storage: "ephemeral",});Options
Section titled “Options”All options are optional. The adapter works with no arguments at all.
| Option | Type | Default | Description |
|---|---|---|---|
chatModel | string | "llama3.2" | Ollama model for chat and memory extraction. |
embeddingModel | string | "nomic-embed-text" | Ollama model for text embeddings. |
baseUrl | string | "http://localhost:11434/v1" | Ollama server URL. |
extractionPrompt | string | Built-in prompt | Custom prompt for memory extraction. Must include a {conversation} placeholder. |
Recommended Models
Section titled “Recommended Models”Chat Models
Section titled “Chat Models”| Model | Size | Notes |
|---|---|---|
llama3.2 | 3B | Default. Fast, good general quality. |
llama3.2:1b | 1B | Smallest and fastest, lower extraction accuracy. |
llama3.3 | 70B | High quality, requires significant RAM. |
mistral | 7B | Good balance of speed and capability. |
gemma2 | 9B | Strong instruction following. |
phi3 | 3.8B | Compact, good at structured output (JSON extraction). |
Embedding Models
Section titled “Embedding Models”| Model | Dimensions | Notes |
|---|---|---|
nomic-embed-text | 768 | Default. Purpose-built embedding model, good quality. |
all-minilm | 384 | Smaller vectors, faster, slightly lower accuracy. |
mxbai-embed-large | 1024 | Higher accuracy, larger vectors. |
const llm = createOllamaAdapter({ chatModel: "mistral", embeddingModel: "mxbai-embed-large",});Performance: Embedding Concurrency
Section titled “Performance: Embedding Concurrency”By default, Vitamem runs up to 5 embedding requests in parallel during the dormant transition pipeline. Ollama processes requests sequentially on most hardware, so parallel requests queue up without a speed benefit and can increase memory pressure.
Set embeddingConcurrency: 1 when using Ollama:
const mem = await createVitamem({ provider: "ollama", storage: "ephemeral", embeddingConcurrency: 1,});Or when using the adapter factory:
import { createOllamaAdapter, createVitamem } from "vitamem";
const mem = await createVitamem({ llm: createOllamaAdapter(), storage: "ephemeral", embeddingConcurrency: 1,});Offline Usage
Section titled “Offline Usage”Ollama runs entirely on your machine. Once models are pulled, no internet connection is required. This makes it ideal for:
- Development and testing — no API keys, no costs, no rate limits.
- Privacy-sensitive deployments — health data never leaves the device.
- Air-gapped environments — works behind firewalls with no external access.
- CI/CD pipelines — deterministic tests without API dependencies.
// Complete offline setupimport { createOllamaAdapter, createVitamem } from "vitamem";
const mem = await createVitamem({ llm: createOllamaAdapter(), storage: "ephemeral", embeddingConcurrency: 1,});
// Everything runs locally -- no network callsconst thread = await mem.createThread({ userId: "local-user" });const { reply } = await mem.chat({ threadId: thread.id, message: "I started taking vitamin D supplements last week.",});Remote Ollama Server
Section titled “Remote Ollama Server”If Ollama is running on a different machine (e.g., a GPU server on your network), point baseUrl to it:
const llm = createOllamaAdapter({ baseUrl: "http://192.168.1.50:11434/v1",});How It Works
Section titled “How It Works”Under the hood, createOllamaAdapter delegates to createOpenAIAdapter with Ollama-specific defaults:
// This:createOllamaAdapter({ chatModel: "mistral" });
// Is equivalent to:createOpenAIAdapter({ apiKey: "ollama", // Ollama ignores the API key chatModel: "mistral", embeddingModel: "nomic-embed-text", baseUrl: "http://localhost:11434/v1",});This works because Ollama implements the OpenAI-compatible chat completions and embeddings endpoints.
Streaming
Section titled “Streaming”The Ollama adapter inherits streaming support from the OpenAI adapter. Use chatStream() or chatWithUserStream() on the Vitamem instance:
const { stream } = await mem.chatStream({ threadId: thread.id, message: "How is my sleep tracking looking?",});
for await (const chunk of stream) { process.stdout.write(chunk);}See Streaming Output for the full guide.
Troubleshooting
Section titled “Troubleshooting”“Connection refused” errors — Make sure Ollama is running. Start it with ollama serve or check that the Ollama desktop app is open.
“Model not found” errors — Pull the model first with ollama pull <model-name>.
Slow embedding pipeline — Set embeddingConcurrency: 1 and consider using a smaller embedding model like all-minilm.
Out of memory — Use smaller models (llama3.2:1b for chat, all-minilm for embeddings) or increase your system swap space.
Next Steps
Section titled “Next Steps”- OpenAI Provider — cloud-hosted models with OpenAI
- Anthropic Provider — use Claude for chat
- Custom LLM Adapter — implement the interface for any provider