Custom LLM Adapter
Overview
Section titled “Overview”Vitamem communicates with language models through the LLMAdapter interface. The built-in adapters (OpenAI, Anthropic, Ollama) cover common providers, but you can implement the interface yourself to use any LLM or embedding service.
The LLMAdapter Interface
Section titled “The LLMAdapter Interface”The interface has three required methods and one optional method:
import type { LLMAdapter, Message, MemorySource } from "vitamem";
interface LLMAdapter { /** Send a chat completion request and return the response text. */ chat(messages: Array<{ role: string; content: string }>): Promise<string>;
/** Stream a chat completion, yielding response tokens as they arrive. Optional. */ chatStream?( messages: Array<{ role: string; content: string }>, ): AsyncGenerator<string, void, unknown>;
/** Extract memorable facts from a conversation as structured data. */ extractMemories( messages: Message[], sessionDate?: string, ): Promise<Array<{ content: string; source: MemorySource; tags?: string[]; profileField?: 'conditions' | 'medications' | 'allergies' | 'vitals' | 'goals' | 'none'; profileKey?: string; profileValue?: string | number | { name: string; dosage?: string; frequency?: string }; profileUnit?: string; }>>;
/** Return a vector embedding for the given text. */ embed(text: string): Promise<number[]>;}Method Details
Section titled “Method Details”chat(messages) (required)
Receives an array of messages with role ("user", "assistant", or "system") and content fields. Returns the model’s response as a plain string.
Vitamem calls this method during mem.chat() to generate replies. The messages array includes conversation history and, when autoRetrieve is enabled, a system message with relevant memories from previous sessions.
chatStream(messages) (optional)
Streaming variant of chat(). Returns an AsyncGenerator that yields response tokens as they are produced. If not implemented, Vitamem’s chatStream() and chatWithUserStream() fall back to non-streaming chat() and yield the full reply as a single chunk.
extractMemories(messages, sessionDate?) (required)
Receives the full Message[] array from a thread (with id, threadId, role, content, and createdAt fields) and an optional sessionDate string (YYYY-MM-DD). Returns an array of extracted facts, each with:
content— a brief factual statementsource— either"confirmed"(user stated it directly) or"inferred"(derived from context)tags— (optional) category tags for filtering (e.g.,"preference","goal")profileField— (optional) structured profile field mapping for health/domain appsprofileKey,profileValue,profileUnit— (optional) structured value details for profile fields
Vitamem calls this during the dormant transition pipeline to identify facts worth remembering. The implementation typically formats the messages into a prompt and parses the LLM’s JSON response. The sessionDate is appended to facts for temporal encoding.
At minimum, returning { content, source } per fact is sufficient. The profile fields are only needed if you use structuredExtractionRules with LLM-first classification.
embed(text) (required)
Receives a text string and returns its vector embedding as a number[] array. The dimensionality depends on your embedding model (e.g., 1536 for OpenAI’s text-embedding-3-small, 768 for nomic-embed-text).
Vitamem calls this to embed extracted memories and to embed search queries during mem.retrieve().
Complete Example: Google Gemini
Section titled “Complete Example: Google Gemini”Here is a full adapter implementation for Google’s Gemini API:
import type { LLMAdapter, Message, MemorySource } from "vitamem";
interface GeminiAdapterOptions { apiKey: string; chatModel?: string; embeddingModel?: string;}
function createGeminiAdapter(opts: GeminiAdapterOptions): LLMAdapter { const chatModel = opts.chatModel ?? "gemini-2.5-flash"; const embeddingModel = opts.embeddingModel ?? "text-embedding-004"; const baseUrl = "https://generativelanguage.googleapis.com/v1beta";
return { async chat(messages) { // Convert messages to Gemini format const systemInstruction = messages .filter((m) => m.role === "system") .map((m) => m.content) .join("\n");
const contents = messages .filter((m) => m.role !== "system") .map((m) => ({ role: m.role === "assistant" ? "model" : "user", parts: [{ text: m.content }], }));
const res = await fetch( `${baseUrl}/models/${chatModel}:generateContent?key=${opts.apiKey}`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ ...(systemInstruction && { system_instruction: { parts: [{ text: systemInstruction }] }, }), contents, }), }, );
const data = await res.json(); return data.candidates[0].content.parts[0].text; },
async extractMemories(messages) { const conversation = messages .map((m) => `${m.role}: ${m.content}`) .join("\n");
const prompt = `Extract key facts from this conversation that are worth remembering long-term.Focus on: health conditions, medications, lifestyle habits, goals, preferences, and personal context.
Conversation:${conversation}
Return a JSON array only (no markdown, no explanation):[{ "content": "brief factual statement", "source": "confirmed" | "inferred" }]
Guidelines:- "confirmed" = user directly stated this fact- "inferred" = you derived this from context- Skip greetings, questions, and one-time events- Be specific (include numbers, dosages, dates when mentioned)`;
const raw = await this.chat([{ role: "user", content: prompt }]); // Strip markdown fences if the model wraps the JSON const cleaned = raw .replace(/^```(?:json)?\n?/, "") .replace(/\n?```$/, "") .trim(); return JSON.parse(cleaned); },
async embed(text) { const res = await fetch( `${baseUrl}/models/${embeddingModel}:embedContent?key=${opts.apiKey}`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ content: { parts: [{ text }] }, }), }, );
const data = await res.json(); return data.embedding.values; }, };}Use it with Vitamem:
import { createVitamem } from "vitamem";
const mem = await createVitamem({ llm: createGeminiAdapter({ apiKey: process.env.GEMINI_API_KEY!, }), storage: "ephemeral",});Example: Separate Chat and Embedding Providers
Section titled “Example: Separate Chat and Embedding Providers”You can mix providers, using one service for chat and another for embeddings:
import type { LLMAdapter, Message, MemorySource } from "vitamem";
function createHybridAdapter(): LLMAdapter { return { async chat(messages) { // Use a local model via fetch const res = await fetch("http://localhost:8080/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "local-model", messages }), }); const data = await res.json(); return data.choices[0].message.content; },
async extractMemories(messages) { // Reuse the chat method for extraction const conversation = messages .map((m) => `${m.role}: ${m.content}`) .join("\n");
const prompt = `Extract health facts as JSON: [{ "content": "...", "source": "confirmed" | "inferred" }]
Conversation:${conversation}`;
const raw = await this.chat([{ role: "user", content: prompt }]); return JSON.parse(raw.replace(/^```(?:json)?\n?/, "").replace(/\n?```$/, "").trim()); },
async embed(text) { // Use OpenAI for embeddings even though chat is local const res = await fetch("https://api.openai.com/v1/embeddings", { method: "POST", headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "text-embedding-3-small", input: text }), }); const data = await res.json(); return data.data[0].embedding; }, };}Implementation Tips
Section titled “Implementation Tips”JSON Parsing
Section titled “JSON Parsing”LLMs sometimes wrap JSON responses in markdown code fences. Always strip them before parsing:
function cleanJson(raw: string): string { return raw .replace(/^```(?:json)?\n?/, "") .replace(/\n?```$/, "") .trim();}Embedding Dimensions
Section titled “Embedding Dimensions”Keep your embedding dimensions consistent. If you store memories with a 1536-dimension model and later switch to a 768-dimension model, cosine similarity searches will fail. Choose your embedding model early and stick with it, or re-embed all existing memories when you change.
The extractMemories Pattern
Section titled “The extractMemories Pattern”The simplest approach is to delegate to chat() with an extraction prompt, then parse the JSON. This is what all built-in adapters do:
async extractMemories(messages) { const conversation = messages .map((m) => `${m.role}: ${m.content}`) .join("\n");
const prompt = `...your extraction prompt with ${conversation}...`; const raw = await this.chat([{ role: "user", content: prompt }]); return JSON.parse(cleanJson(raw));},Error Handling
Section titled “Error Handling”Vitamem does not add retry logic around adapter calls. If your provider has rate limits or transient failures, handle retries within your adapter:
async chat(messages) { for (let attempt = 0; attempt < 3; attempt++) { try { return await callMyLLM(messages); } catch (err) { if (attempt === 2) throw err; await new Promise((r) => setTimeout(r, 1000 * (attempt + 1))); } } throw new Error("Unreachable");},Testing Your Adapter
Section titled “Testing Your Adapter”Before plugging a custom adapter into Vitamem, verify each method independently:
const adapter = createMyAdapter({ /* ... */ });
// Test chatconst reply = await adapter.chat([ { role: "user", content: "Hello, how are you?" },]);console.log("Chat:", reply);
// Test extractionconst facts = await adapter.extractMemories([ { id: "1", threadId: "t1", role: "user", content: "I take metformin 500mg twice daily.", createdAt: new Date() }, { id: "2", threadId: "t1", role: "assistant", content: "Got it, I'll remember that.", createdAt: new Date() },]);console.log("Facts:", facts);
// Test embeddingconst vec = await adapter.embed("Type 2 diabetes management");console.log("Embedding dimensions:", vec.length);TypeScript Types
Section titled “TypeScript Types”For reference, here are the types used by the adapter interface:
// Message passed to extractMemoriesinterface Message { id: string; threadId: string; role: "user" | "assistant" | "system"; content: string; createdAt: Date;}
// Source classification for extracted factstype MemorySource = "confirmed" | "inferred";Import them from the package:
import type { LLMAdapter, Message, MemorySource } from "vitamem";Next Steps
Section titled “Next Steps”- OpenAI Provider — reference implementation using the OpenAI SDK
- Anthropic Provider — example of multi-provider adapter (Anthropic chat + OpenAI embeddings)
- Custom Storage Guide — building custom storage adapters