Memory Extraction

When a thread transitions to dormant, Vitamem extracts the meaningful facts from the conversation using your LLM adapter. This is the core operation that turns raw conversation history into a searchable memory store.

Why Extraction Matters

Extraction converts raw conversation into structured, searchable facts. Instead of storing entire message histories verbatim, Vitamem distills each session down to the handful of statements that matter — stripping away greetings, filler, and back-and-forth clarification. Each fact is classified as confirmed or inferred, giving downstream systems a confidence signal they wouldn’t have with raw messages.

The real power of extraction compounds over time. As sessions accumulate, deduplication and supersede keep the memory store lean instead of growing linearly:

Session	Raw conversation	Memory store result
1	”I’m learning Rust and want to ship my first CLI by March”	New facts saved: `"Learning Rust"`, `"Goal: ship CLI by March"`
5	User mentions Rust again in passing	Deduplicated — already stored, nothing added
12	”Finished the CLI, now building a web API in Rust”	Supersedes the old goal in-place

After 30 daily sessions, a user’s memory store might contain ~20 refined facts rather than thousands of raw messages. Without extraction, memory grows with every conversation; with it, the store converges on a concise, current knowledge base that is fast to search, cheap to embed, and keeps LLM context windows lean — only extracted facts are searched and retrieved, not raw conversation history.

How Extraction Works

The extraction pipeline calls llm.extractMemories(messages) with the full conversation history. Your LLM adapter is responsible for returning an array of structured facts:

// What the LLM adapter returns
[
  { content: "Prefers TypeScript over JavaScript", source: "confirmed" },
  { content: "Deploys on Vercel", source: "confirmed" },
  { content: "Working on a React Native mobile app", source: "confirmed" },
  { content: "Prefers concise explanations", source: "inferred" },
]

Memory Sources

Facts are classified by how confident we are they represent ground truth:

Source	Meaning	Example
`confirmed`	User explicitly stated this	”I prefer TypeScript”
`inferred`	Derived from context or assistant messages	User seems to prefer concise explanations

This classification helps downstream systems apply appropriate confidence when using memories to personalize responses.

The Built-in Extraction Prompt

When using the built-in adapter factories (createOpenAIAdapter, createAnthropicAdapter, createOllamaAdapter), Vitamem includes a domain-agnostic default extraction prompt that works across all use cases.

The default prompt extracts personal details, preferences, goals, habits, and important context — anything the user would expect to be remembered next time. Facts are tagged with categories like "preference", "goal", "personal", "professional", "lifestyle", or "general".

For health-specific use cases, Vitamem exports HEALTH_EXTRACTION_PROMPT and HEALTH_EXTRACTION_PROMPT_ANTHROPIC — these include structured profile field support for conditions, medications, allergies, vitals, and goals. See the Health Companion Guide for details.

The {conversation} placeholder is replaced with the formatted conversation history at extraction time.

Custom Extraction Prompt

Using a Different Model for Extraction

You can use a cheaper or faster model for memory extraction while keeping a more capable model for chat responses. Set extractionModel in your config:

const vm = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-4o",              // capable model for chat
  extractionModel: "gpt-4o-mini", // cheaper model for extraction
  storage: "ephemeral",
});

This is useful when extraction doesn’t require the full reasoning power of your chat model. Since extraction happens in the background during dormant transitions, using a cheaper model can significantly reduce costs without impacting chat quality.

You can set extractionPrompt directly in the top-level VitamemConfig. When using the provider shortcut, this is forwarded to the built-in adapter automatically:

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
  extractionPrompt: `You are a fitness coach assistant. Extract facts about this person's
fitness journey from the conversation.

Focus on:
- Exercise routines and frequency
- Personal records and progress
- Injuries or physical limitations
- Nutritional preferences and diet plans

Conversation:
{conversation}

Return a JSON array only:
[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]`,
});

Alternatively, you can pass extractionPrompt to the adapter factory directly:

import { createVitamem, createOpenAIAdapter } from "vitamem";

const llm = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  chatModel: "gpt-4o",
  extractionPrompt: `You are a fitness coach assistant. Extract facts about this person's
fitness journey from the conversation.

Focus on:
- Exercise routines and frequency
- Personal records and progress
- Injuries or physical limitations
- Nutritional preferences and diet plans
- Competition goals and timelines

Conversation:
{conversation}

Return a JSON array only:
[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]

- "confirmed" = user directly stated this
- "inferred" = derived from context
- Skip greetings and small talk
- Include specific numbers (weights, reps, times) when mentioned`,
});

const mem = await createVitamem({
  llm,
  storage: "ephemeral",
});

Prompt Requirements

Your custom extraction prompt must:

Include {conversation} where the formatted messages should be inserted
Instruct the LLM to return a JSON array of { content, source } objects
Define what "confirmed" and "inferred" mean for your domain

Tips for Effective Extraction Prompts

Be domain-specific. A fitness app should extract different facts than a tutoring assistant.
List categories. Explicitly tell the LLM what kinds of facts to look for.
Include negative examples. Tell the LLM what to skip (greetings, questions, opinions).
Request specificity. Ask for numbers, dates, and dosages instead of vague statements.

Writing a Fully Custom Adapter

If you need complete control over extraction logic (not just the prompt), implement the LLMAdapter interface directly:

const customLLM: LLMAdapter = {
  async chat(messages) {
    // Your chat implementation
  },
  async extractMemories(messages) {
    // Your custom extraction logic
    // Can use multiple LLM calls, rule-based filtering, etc.
  },
  async embed(text) {
    // Your embedding implementation
  },
};

const mem = await createVitamem({
  llm: customLLM,
  storage: "ephemeral",
});

Fallback: Rule-Based Extraction

Vitamem also exports a simple rule-based extractor (extractFactsSimple) that does not require an LLM. It classifies user messages as confirmed and assistant messages as inferred, looking for sentences containing personal pronouns and action verbs.

import { extractFactsSimple } from "vitamem";

const facts = extractFactsSimple(messages);
// [{ content: "I prefer TypeScript", source: "confirmed" }, ...]

This is useful for testing but produces lower-quality memories than LLM extraction.

Structured Output

Vitamem uses structured output when available to ensure reliable JSON parsing from LLM responses during memory extraction.

How It Works

When extracting memories, Vitamem requests JSON-formatted responses from the LLM provider:

OpenAI / DashScope / Ollama: Uses response_format: { type: "json_object" } to guarantee valid JSON output
Anthropic: Uses prompt-based JSON guidance with post-validation

All responses are parsed through a validation layer that:

Accepts both wrapper objects ({ "memories": [...] }) and bare arrays for backward compatibility
Validates each entry has a content string and a valid source (“confirmed” or “inferred”)
Strips invalid entries silently — partial extraction is better than total failure

Provider Support

Provider	JSON Mode	Schema Enforcement
OpenAI	`json_object`	`json_schema` available for gpt-4o+
DashScope	`json_object`	Not supported — prompt-based
Anthropic	Prompt-based	`output_config.format` available (GA)
Ollama	`json_object`	Native format supports schemas

Hybrid Memory Architecture

Vitamem uses a two-table architecture to handle different types of user data optimally:

Structured Data (User Profile)

Critical structured data (like health metrics, medications, or allergies in a health app) can be extracted into a structured profile with direct field updates. This means:

Value updates are immediate — no embedding comparison needed
Entries are deduplicated by field — adding the same item twice results in one entry
Profile data is always available — injected into every conversation automatically

Semantic Data (Memory Store)

Contextual information like lifestyle habits, preferences, and conversation insights continue through the embedding pipeline with vector-based deduplication and supersede.

How It Works

During extraction, each fact is classified by pattern matching:

Facts matching structured data patterns → Profile update (no embedding needed)
All other facts → Embedding pipeline (existing dedup + supersede logic)

Configure with structuredExtractionRules in your Vitamem config:

import { createVitamem, HEALTH_STRUCTURED_RULES } from "vitamem";

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
  structuredExtractionRules: HEALTH_STRUCTURED_RULES,
});

The HEALTH_STRUCTURED_RULES preset includes patterns for:

Vitals: A1C, blood pressure, weight, blood glucose
Allergies: any allergy mention
Medications: medication with dosage
Conditions: diagnosed conditions
Goals: health and wellness goals

You can also define custom rules using the StructuredExtractionRule interface.

Auto-Pinning

Critical information can be automatically pinned during extraction so it always appears first in retrieval results. This ensures important facts (allergies, constraints, key preferences) are never buried by less important memories.

Configuring Auto-Pin Rules

Pass autoPinRules in your VitamemConfig. Each rule is an AutoPinRule — either a regex pattern or a custom test function:

import { createVitamem, HEALTH_AUTO_PIN_RULES } from "vitamem";
import type { AutoPinRule } from "vitamem";

// Use built-in health rules
const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
  autoPinRules: HEALTH_AUTO_PIN_RULES,
});

Built-in Health Auto-Pin Rules

HEALTH_AUTO_PIN_RULES includes 7 rules that cover the most common safety-critical categories:

Pattern	Reason	Example Match
`\ballerg(y\|ic\|ies)\b`	allergy	”Allergic to penicillin”
`\banaphyla(xis\|ctic)\b`	anaphylaxis-risk	”History of anaphylaxis”
`\b(drug\|medication)\s*(interaction\|contraindication)\b`	drug-interaction	”Known drug interaction with warfarin”
`\bdo\snot\s(take\|use\|prescribe)\b`	contraindication	”Do not prescribe NSAIDs”
`\bemergency\s*contact\b`	emergency-contact	”Emergency contact: spouse at 555-1234”
`\bblood\s*type\b`	blood-type	”Blood type A+“
`\b\d+\s*(mg\|mcg\|ml\|units?)\b`	medication-dosage	”Takes Metformin 1000mg”

Custom Auto-Pin Rules

You can define your own rules using regex patterns or test functions:

const customRules: AutoPinRule[] = [
  { pattern: /\bDNR\b/i, reason: "do-not-resuscitate" },
  {
    test: (memory) => memory.source === "confirmed" && /\bimplant\b/i.test(memory.content),
    reason: "medical-implant",
  },
];

Behavior Notes

Auto-pinned memories always appear first in retrieval results
Superseded memories can also be auto-pinned — pinning is additive and never removes existing pins
Pin status persists across memory updates

Role-Aware Extraction

The extraction pipeline treats user messages and AI/assistant messages differently to avoid redundant writes:

User messages: All stated facts are extracted as new data. If the user says “I switched to Vercel for hosting”, that is always extracted.
AI/assistant messages: Only genuinely new information is extracted — recommendations, advice, or new facts the assistant introduces.

Critically, when the AI echoes back known data (e.g., repeating a user’s A1C value from memory context), that echo is not re-extracted. This prevents redundant writes and preserves data history.

Example

Message	Role	Extracted?	Reason
”I switched to Vercel for hosting”	User	Yes	New fact stated by user
”Since you’re on Vercel, here are some deployment tips…”	Assistant	No (Vercel fact)	AI is echoing back a known value
”I recommend using Edge Functions for lower latency”	Assistant	Yes	Genuinely new recommendation

Without this distinction, every time the AI acknowledged a fact from memory, the pipeline would re-extract and re-process it — triggering unnecessary deduplication checks.

Tighter Inference Rules

Facts tagged as "inferred" are held to a high bar: they require specific, unambiguous evidence from the conversation. The extraction pipeline does not extract generalizations, assumptions, or speculations.

What qualifies as a valid inference

The user says something that directly implies a specific fact (e.g., “I take my pills with breakfast” → inferred: takes medication in the morning)
The assistant provides a concrete recommendation that represents new information

What does NOT qualify

Broad generalizations unsupported by specific statements
Speculative conclusions about the user’s intent or motivation
Re-statements of existing knowledge in different words

Example

If a user says: “I exercise Monday, Wednesday, and Friday, and I’m cutting carbs.”

Extracted	Source	Valid?
”Exercises on Monday, Wednesday, and Friday”	confirmed	Yes — directly stated
”Reducing carbohydrate intake”	confirmed	Yes — directly stated
”Implementing changes to exercise routine”	inferred	No — unsupported generalization
”Motivated to improve health”	inferred	No — speculation

When in doubt, the system errs on the side of not extracting. A missing inference can be captured in a future conversation, but a false inference pollutes the memory store and is harder to correct.

Temporal Encoding

Extracted facts are automatically enriched with a temporal suffix showing when they were learned. For example, a fact extracted from a session on March 15, 2026 would be stored as:

"Prefers TypeScript over JavaScript (mentioned 2026-03-15)"

The pipeline derives the date from thread.lastMessageAt, which reflects the timestamp of the most recent message in the session. This temporal annotation enables:

Temporal reasoning — the LLM can understand when facts were learned and reason about time-sensitive information (e.g., “you mentioned this 3 months ago”)
Chronological retrieval — memories can be sorted by date, giving users and AI a timeline of how information evolved

Temporal encoding is applied automatically during the extraction pipeline. No configuration is needed.

See Temporal Encoding for a deeper look at how dates are derived and used in retrieval.

Reflection Pass

Vitamem supports an optional reflection pass — a second LLM call that validates and enriches the facts produced by the initial extraction. When enabled, the reflection pass:

Catches contradictions — flags facts that conflict with each other or with existing memories
Enriches vague facts — adds specificity to facts that are too general (e.g., “exercises sometimes” becomes “exercises 3 times per week”)
Detects missed information — identifies important facts that the initial extraction overlooked

Enable reflection in your config:

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
  enableReflection: true,
});

Reflection adds one additional LLM call per dormant transition. It is especially valuable for any domain where accuracy matters (health, finance, legal, education) and less critical for casual applications.

See Reflection for details on how the reflection prompt works and how to customize it.

What Gets Stored

After extraction, each fact goes through the deduplication pipeline before being saved. Only facts that are genuinely new (not already represented in the user’s memory store) are persisted.

The final stored memory includes:

content — the fact text
source — confirmed or inferred
embedding — the vector representation (e.g., 1536 dimensions for text-embedding-3-small)
userId — for retrieval scoping
threadId — for traceability
pinned — true if the memory matched an auto-pin rule (see above)
tags — string array for categorization (e.g. the auto-pin rule name)
lastRetrievedAt — timestamp of the last time this memory was returned by a retrieval query (used by active forgetting)
retrievalCount — how many times this memory has been retrieved (helps distinguish frequently-accessed facts from stale ones)