Skip to content

Custom LLM Adapter

Vitamem communicates with language models through the LLMAdapter interface. The built-in adapters (OpenAI, Anthropic, Ollama) cover common providers, but you can implement the interface yourself to use any LLM or embedding service.

The interface has three required methods and one optional method:

import type { LLMAdapter, Message, MemorySource } from "vitamem";
interface LLMAdapter {
/** Send a chat completion request and return the response text. */
chat(messages: Array<{ role: string; content: string }>): Promise<string>;
/** Stream a chat completion, yielding response tokens as they arrive. Optional. */
chatStream?(
messages: Array<{ role: string; content: string }>,
): AsyncGenerator<string, void, unknown>;
/** Extract memorable facts from a conversation as structured data. */
extractMemories(
messages: Message[],
sessionDate?: string,
): Promise<Array<{
content: string;
source: MemorySource;
tags?: string[];
profileField?: 'conditions' | 'medications' | 'allergies' | 'vitals' | 'goals' | 'none';
profileKey?: string;
profileValue?: string | number | { name: string; dosage?: string; frequency?: string };
profileUnit?: string;
}>>;
/** Return a vector embedding for the given text. */
embed(text: string): Promise<number[]>;
}

chat(messages) (required)

Receives an array of messages with role ("user", "assistant", or "system") and content fields. Returns the model’s response as a plain string.

Vitamem calls this method during mem.chat() to generate replies. The messages array includes conversation history and, when autoRetrieve is enabled, a system message with relevant memories from previous sessions.

chatStream(messages) (optional)

Streaming variant of chat(). Returns an AsyncGenerator that yields response tokens as they are produced. If not implemented, Vitamem’s chatStream() and chatWithUserStream() fall back to non-streaming chat() and yield the full reply as a single chunk.

extractMemories(messages, sessionDate?) (required)

Receives the full Message[] array from a thread (with id, threadId, role, content, and createdAt fields) and an optional sessionDate string (YYYY-MM-DD). Returns an array of extracted facts, each with:

  • content — a brief factual statement
  • source — either "confirmed" (user stated it directly) or "inferred" (derived from context)
  • tags(optional) category tags for filtering (e.g., "preference", "goal")
  • profileField(optional) structured profile field mapping for health/domain apps
  • profileKey, profileValue, profileUnit(optional) structured value details for profile fields

Vitamem calls this during the dormant transition pipeline to identify facts worth remembering. The implementation typically formats the messages into a prompt and parses the LLM’s JSON response. The sessionDate is appended to facts for temporal encoding.

At minimum, returning { content, source } per fact is sufficient. The profile fields are only needed if you use structuredExtractionRules with LLM-first classification.

embed(text) (required)

Receives a text string and returns its vector embedding as a number[] array. The dimensionality depends on your embedding model (e.g., 1536 for OpenAI’s text-embedding-3-small, 768 for nomic-embed-text).

Vitamem calls this to embed extracted memories and to embed search queries during mem.retrieve().

Here is a full adapter implementation for Google’s Gemini API:

import type { LLMAdapter, Message, MemorySource } from "vitamem";
interface GeminiAdapterOptions {
apiKey: string;
chatModel?: string;
embeddingModel?: string;
}
function createGeminiAdapter(opts: GeminiAdapterOptions): LLMAdapter {
const chatModel = opts.chatModel ?? "gemini-2.5-flash";
const embeddingModel = opts.embeddingModel ?? "text-embedding-004";
const baseUrl = "https://generativelanguage.googleapis.com/v1beta";
return {
async chat(messages) {
// Convert messages to Gemini format
const systemInstruction = messages
.filter((m) => m.role === "system")
.map((m) => m.content)
.join("\n");
const contents = messages
.filter((m) => m.role !== "system")
.map((m) => ({
role: m.role === "assistant" ? "model" : "user",
parts: [{ text: m.content }],
}));
const res = await fetch(
`${baseUrl}/models/${chatModel}:generateContent?key=${opts.apiKey}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
...(systemInstruction && {
system_instruction: { parts: [{ text: systemInstruction }] },
}),
contents,
}),
},
);
const data = await res.json();
return data.candidates[0].content.parts[0].text;
},
async extractMemories(messages) {
const conversation = messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n");
const prompt = `Extract key facts from this conversation that are worth remembering long-term.
Focus on: health conditions, medications, lifestyle habits, goals, preferences, and personal context.
Conversation:
${conversation}
Return a JSON array only (no markdown, no explanation):
[{ "content": "brief factual statement", "source": "confirmed" | "inferred" }]
Guidelines:
- "confirmed" = user directly stated this fact
- "inferred" = you derived this from context
- Skip greetings, questions, and one-time events
- Be specific (include numbers, dosages, dates when mentioned)`;
const raw = await this.chat([{ role: "user", content: prompt }]);
// Strip markdown fences if the model wraps the JSON
const cleaned = raw
.replace(/^```(?:json)?\n?/, "")
.replace(/\n?```$/, "")
.trim();
return JSON.parse(cleaned);
},
async embed(text) {
const res = await fetch(
`${baseUrl}/models/${embeddingModel}:embedContent?key=${opts.apiKey}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
content: { parts: [{ text }] },
}),
},
);
const data = await res.json();
return data.embedding.values;
},
};
}

Use it with Vitamem:

import { createVitamem } from "vitamem";
const mem = await createVitamem({
llm: createGeminiAdapter({
apiKey: process.env.GEMINI_API_KEY!,
}),
storage: "ephemeral",
});

Example: Separate Chat and Embedding Providers

Section titled “Example: Separate Chat and Embedding Providers”

You can mix providers, using one service for chat and another for embeddings:

import type { LLMAdapter, Message, MemorySource } from "vitamem";
function createHybridAdapter(): LLMAdapter {
return {
async chat(messages) {
// Use a local model via fetch
const res = await fetch("http://localhost:8080/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model: "local-model", messages }),
});
const data = await res.json();
return data.choices[0].message.content;
},
async extractMemories(messages) {
// Reuse the chat method for extraction
const conversation = messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n");
const prompt = `Extract health facts as JSON: [{ "content": "...", "source": "confirmed" | "inferred" }]
Conversation:
${conversation}`;
const raw = await this.chat([{ role: "user", content: prompt }]);
return JSON.parse(raw.replace(/^```(?:json)?\n?/, "").replace(/\n?```$/, "").trim());
},
async embed(text) {
// Use OpenAI for embeddings even though chat is local
const res = await fetch("https://api.openai.com/v1/embeddings", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ model: "text-embedding-3-small", input: text }),
});
const data = await res.json();
return data.data[0].embedding;
},
};
}

LLMs sometimes wrap JSON responses in markdown code fences. Always strip them before parsing:

function cleanJson(raw: string): string {
return raw
.replace(/^```(?:json)?\n?/, "")
.replace(/\n?```$/, "")
.trim();
}

Keep your embedding dimensions consistent. If you store memories with a 1536-dimension model and later switch to a 768-dimension model, cosine similarity searches will fail. Choose your embedding model early and stick with it, or re-embed all existing memories when you change.

The simplest approach is to delegate to chat() with an extraction prompt, then parse the JSON. This is what all built-in adapters do:

async extractMemories(messages) {
const conversation = messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n");
const prompt = `...your extraction prompt with ${conversation}...`;
const raw = await this.chat([{ role: "user", content: prompt }]);
return JSON.parse(cleanJson(raw));
},

Vitamem does not add retry logic around adapter calls. If your provider has rate limits or transient failures, handle retries within your adapter:

async chat(messages) {
for (let attempt = 0; attempt < 3; attempt++) {
try {
return await callMyLLM(messages);
} catch (err) {
if (attempt === 2) throw err;
await new Promise((r) => setTimeout(r, 1000 * (attempt + 1)));
}
}
throw new Error("Unreachable");
},

Before plugging a custom adapter into Vitamem, verify each method independently:

const adapter = createMyAdapter({ /* ... */ });
// Test chat
const reply = await adapter.chat([
{ role: "user", content: "Hello, how are you?" },
]);
console.log("Chat:", reply);
// Test extraction
const facts = await adapter.extractMemories([
{ id: "1", threadId: "t1", role: "user", content: "I take metformin 500mg twice daily.", createdAt: new Date() },
{ id: "2", threadId: "t1", role: "assistant", content: "Got it, I'll remember that.", createdAt: new Date() },
]);
console.log("Facts:", facts);
// Test embedding
const vec = await adapter.embed("Type 2 diabetes management");
console.log("Embedding dimensions:", vec.length);

For reference, here are the types used by the adapter interface:

// Message passed to extractMemories
interface Message {
id: string;
threadId: string;
role: "user" | "assistant" | "system";
content: string;
createdAt: Date;
}
// Source classification for extracted facts
type MemorySource = "confirmed" | "inferred";

Import them from the package:

import type { LLMAdapter, Message, MemorySource } from "vitamem";