Memory Extraction
Memory Extraction
Section titled “Memory Extraction”When a thread transitions to dormant, Vitamem extracts the meaningful facts from the conversation using your LLM adapter. This is the core operation that turns raw conversation history into a searchable memory store.
Why Extraction Matters
Section titled “Why Extraction Matters”Extraction converts raw conversation into structured, searchable facts. Instead of storing entire message histories verbatim, Vitamem distills each session down to the handful of statements that matter — stripping away greetings, filler, and back-and-forth clarification. Each fact is classified as confirmed or inferred, giving downstream systems a confidence signal they wouldn’t have with raw messages.
The real power of extraction compounds over time. As sessions accumulate, deduplication and supersede keep the memory store lean instead of growing linearly:
| Session | Raw conversation | Memory store result |
|---|---|---|
| 1 | ”I’m learning Rust and want to ship my first CLI by March” | New facts saved: "Learning Rust", "Goal: ship CLI by March" |
| 5 | User mentions Rust again in passing | Deduplicated — already stored, nothing added |
| 12 | ”Finished the CLI, now building a web API in Rust” | Supersedes the old goal in-place |
After 30 daily sessions, a user’s memory store might contain ~20 refined facts rather than thousands of raw messages. Without extraction, memory grows with every conversation; with it, the store converges on a concise, current knowledge base that is fast to search, cheap to embed, and keeps LLM context windows lean — only extracted facts are searched and retrieved, not raw conversation history.
How Extraction Works
Section titled “How Extraction Works”The extraction pipeline calls llm.extractMemories(messages) with the full conversation history. Your LLM adapter is responsible for returning an array of structured facts:
// What the LLM adapter returns[ { content: "Prefers TypeScript over JavaScript", source: "confirmed" }, { content: "Deploys on Vercel", source: "confirmed" }, { content: "Working on a React Native mobile app", source: "confirmed" }, { content: "Prefers concise explanations", source: "inferred" },]Memory Sources
Section titled “Memory Sources”Facts are classified by how confident we are they represent ground truth:
| Source | Meaning | Example |
|---|---|---|
confirmed | User explicitly stated this | ”I prefer TypeScript” |
inferred | Derived from context or assistant messages | User seems to prefer concise explanations |
This classification helps downstream systems apply appropriate confidence when using memories to personalize responses.
The Built-in Extraction Prompt
Section titled “The Built-in Extraction Prompt”When using the built-in adapter factories (createOpenAIAdapter, createAnthropicAdapter, createOllamaAdapter), Vitamem includes a domain-agnostic default extraction prompt that works across all use cases.
The default prompt extracts personal details, preferences, goals, habits, and important context — anything the user would expect to be remembered next time. Facts are tagged with categories like "preference", "goal", "personal", "professional", "lifestyle", or "general".
For health-specific use cases, Vitamem exports HEALTH_EXTRACTION_PROMPT and HEALTH_EXTRACTION_PROMPT_ANTHROPIC — these include structured profile field support for conditions, medications, allergies, vitals, and goals. See the Health Companion Guide for details.
The {conversation} placeholder is replaced with the formatted conversation history at extraction time.
Custom Extraction Prompt
Section titled “Custom Extraction Prompt”Using a Different Model for Extraction
Section titled “Using a Different Model for Extraction”You can use a cheaper or faster model for memory extraction while keeping a more capable model for chat responses. Set extractionModel in your config:
const vm = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, model: "gpt-4o", // capable model for chat extractionModel: "gpt-4o-mini", // cheaper model for extraction storage: "ephemeral",});This is useful when extraction doesn’t require the full reasoning power of your chat model. Since extraction happens in the background during dormant transitions, using a cheaper model can significantly reduce costs without impacting chat quality.
You can set extractionPrompt directly in the top-level VitamemConfig. When using the provider shortcut, this is forwarded to the built-in adapter automatically:
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, storage: "ephemeral", extractionPrompt: `You are a fitness coach assistant. Extract facts about this person'sfitness journey from the conversation.
Focus on:- Exercise routines and frequency- Personal records and progress- Injuries or physical limitations- Nutritional preferences and diet plans
Conversation:{conversation}
Return a JSON array only:[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]`,});Alternatively, you can pass extractionPrompt to the adapter factory directly:
import { createVitamem, createOpenAIAdapter } from "vitamem";
const llm = createOpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY!, chatModel: "gpt-4o", extractionPrompt: `You are a fitness coach assistant. Extract facts about this person'sfitness journey from the conversation.
Focus on:- Exercise routines and frequency- Personal records and progress- Injuries or physical limitations- Nutritional preferences and diet plans- Competition goals and timelines
Conversation:{conversation}
Return a JSON array only:[{ "content": "specific factual statement", "source": "confirmed" | "inferred" }]
- "confirmed" = user directly stated this- "inferred" = derived from context- Skip greetings and small talk- Include specific numbers (weights, reps, times) when mentioned`,});
const mem = await createVitamem({ llm, storage: "ephemeral",});Prompt Requirements
Section titled “Prompt Requirements”Your custom extraction prompt must:
- Include
{conversation}where the formatted messages should be inserted - Instruct the LLM to return a JSON array of
{ content, source }objects - Define what
"confirmed"and"inferred"mean for your domain
Tips for Effective Extraction Prompts
Section titled “Tips for Effective Extraction Prompts”- Be domain-specific. A fitness app should extract different facts than a tutoring assistant.
- List categories. Explicitly tell the LLM what kinds of facts to look for.
- Include negative examples. Tell the LLM what to skip (greetings, questions, opinions).
- Request specificity. Ask for numbers, dates, and dosages instead of vague statements.
Writing a Fully Custom Adapter
Section titled “Writing a Fully Custom Adapter”If you need complete control over extraction logic (not just the prompt), implement the LLMAdapter interface directly:
const customLLM: LLMAdapter = { async chat(messages) { // Your chat implementation }, async extractMemories(messages) { // Your custom extraction logic // Can use multiple LLM calls, rule-based filtering, etc. }, async embed(text) { // Your embedding implementation },};
const mem = await createVitamem({ llm: customLLM, storage: "ephemeral",});Fallback: Rule-Based Extraction
Section titled “Fallback: Rule-Based Extraction”Vitamem also exports a simple rule-based extractor (extractFactsSimple) that does not require an LLM. It classifies user messages as confirmed and assistant messages as inferred, looking for sentences containing personal pronouns and action verbs.
import { extractFactsSimple } from "vitamem";
const facts = extractFactsSimple(messages);// [{ content: "I prefer TypeScript", source: "confirmed" }, ...]This is useful for testing but produces lower-quality memories than LLM extraction.
Structured Output
Section titled “Structured Output”Vitamem uses structured output when available to ensure reliable JSON parsing from LLM responses during memory extraction.
How It Works
Section titled “How It Works”When extracting memories, Vitamem requests JSON-formatted responses from the LLM provider:
- OpenAI / DashScope / Ollama: Uses
response_format: { type: "json_object" }to guarantee valid JSON output - Anthropic: Uses prompt-based JSON guidance with post-validation
All responses are parsed through a validation layer that:
- Accepts both wrapper objects (
{ "memories": [...] }) and bare arrays for backward compatibility - Validates each entry has a
contentstring and a validsource(“confirmed” or “inferred”) - Strips invalid entries silently — partial extraction is better than total failure
Provider Support
Section titled “Provider Support”| Provider | JSON Mode | Schema Enforcement |
|---|---|---|
| OpenAI | json_object | json_schema available for gpt-4o+ |
| DashScope | json_object | Not supported — prompt-based |
| Anthropic | Prompt-based | output_config.format available (GA) |
| Ollama | json_object | Native format supports schemas |
Hybrid Memory Architecture
Section titled “Hybrid Memory Architecture”Vitamem uses a two-table architecture to handle different types of user data optimally:
Structured Data (User Profile)
Section titled “Structured Data (User Profile)”Critical structured data (like health metrics, medications, or allergies in a health app) can be extracted into a structured profile with direct field updates. This means:
- Value updates are immediate — no embedding comparison needed
- Entries are deduplicated by field — adding the same item twice results in one entry
- Profile data is always available — injected into every conversation automatically
Semantic Data (Memory Store)
Section titled “Semantic Data (Memory Store)”Contextual information like lifestyle habits, preferences, and conversation insights continue through the embedding pipeline with vector-based deduplication and supersede.
How It Works
Section titled “How It Works”During extraction, each fact is classified by pattern matching:
- Facts matching structured data patterns → Profile update (no embedding needed)
- All other facts → Embedding pipeline (existing dedup + supersede logic)
Configure with structuredExtractionRules in your Vitamem config:
import { createVitamem, HEALTH_STRUCTURED_RULES } from "vitamem";
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, storage: "ephemeral", structuredExtractionRules: HEALTH_STRUCTURED_RULES,});The HEALTH_STRUCTURED_RULES preset includes patterns for:
- Vitals: A1C, blood pressure, weight, blood glucose
- Allergies: any allergy mention
- Medications: medication with dosage
- Conditions: diagnosed conditions
- Goals: health and wellness goals
You can also define custom rules using the StructuredExtractionRule interface.
Auto-Pinning
Section titled “Auto-Pinning”Critical information can be automatically pinned during extraction so it always appears first in retrieval results. This ensures important facts (allergies, constraints, key preferences) are never buried by less important memories.
Configuring Auto-Pin Rules
Section titled “Configuring Auto-Pin Rules”Pass autoPinRules in your VitamemConfig. Each rule is an AutoPinRule — either a regex pattern or a custom test function:
import { createVitamem, HEALTH_AUTO_PIN_RULES } from "vitamem";import type { AutoPinRule } from "vitamem";
// Use built-in health rulesconst mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, storage: "ephemeral", autoPinRules: HEALTH_AUTO_PIN_RULES,});Built-in Health Auto-Pin Rules
Section titled “Built-in Health Auto-Pin Rules”HEALTH_AUTO_PIN_RULES includes 7 rules that cover the most common safety-critical categories:
| Pattern | Reason | Example Match |
|---|---|---|
\ballerg(y|ic|ies)\b | allergy | ”Allergic to penicillin” |
\banaphyla(xis|ctic)\b | anaphylaxis-risk | ”History of anaphylaxis” |
\b(drug|medication)\s*(interaction|contraindication)\b | drug-interaction | ”Known drug interaction with warfarin” |
\bdo\s*not\s*(take|use|prescribe)\b | contraindication | ”Do not prescribe NSAIDs” |
\bemergency\s*contact\b | emergency-contact | ”Emergency contact: spouse at 555-1234” |
\bblood\s*type\b | blood-type | ”Blood type A+“ |
\b\d+\s*(mg|mcg|ml|units?)\b | medication-dosage | ”Takes Metformin 1000mg” |
Custom Auto-Pin Rules
Section titled “Custom Auto-Pin Rules”You can define your own rules using regex patterns or test functions:
const customRules: AutoPinRule[] = [ { pattern: /\bDNR\b/i, reason: "do-not-resuscitate" }, { test: (memory) => memory.source === "confirmed" && /\bimplant\b/i.test(memory.content), reason: "medical-implant", },];Behavior Notes
Section titled “Behavior Notes”- Auto-pinned memories always appear first in retrieval results
- Superseded memories can also be auto-pinned — pinning is additive and never removes existing pins
- Pin status persists across memory updates
Role-Aware Extraction
Section titled “Role-Aware Extraction”The extraction pipeline treats user messages and AI/assistant messages differently to avoid redundant writes:
- User messages: All stated facts are extracted as new data. If the user says “I switched to Vercel for hosting”, that is always extracted.
- AI/assistant messages: Only genuinely new information is extracted — recommendations, advice, or new facts the assistant introduces.
Critically, when the AI echoes back known data (e.g., repeating a user’s A1C value from memory context), that echo is not re-extracted. This prevents redundant writes and preserves data history.
Example
Section titled “Example”| Message | Role | Extracted? | Reason |
|---|---|---|---|
| ”I switched to Vercel for hosting” | User | Yes | New fact stated by user |
| ”Since you’re on Vercel, here are some deployment tips…” | Assistant | No (Vercel fact) | AI is echoing back a known value |
| ”I recommend using Edge Functions for lower latency” | Assistant | Yes | Genuinely new recommendation |
Without this distinction, every time the AI acknowledged a fact from memory, the pipeline would re-extract and re-process it — triggering unnecessary deduplication checks.
Tighter Inference Rules
Section titled “Tighter Inference Rules”Facts tagged as "inferred" are held to a high bar: they require specific, unambiguous evidence from the conversation. The extraction pipeline does not extract generalizations, assumptions, or speculations.
What qualifies as a valid inference
Section titled “What qualifies as a valid inference”- The user says something that directly implies a specific fact (e.g., “I take my pills with breakfast” → inferred: takes medication in the morning)
- The assistant provides a concrete recommendation that represents new information
What does NOT qualify
Section titled “What does NOT qualify”- Broad generalizations unsupported by specific statements
- Speculative conclusions about the user’s intent or motivation
- Re-statements of existing knowledge in different words
Example
Section titled “Example”If a user says: “I exercise Monday, Wednesday, and Friday, and I’m cutting carbs.”
| Extracted | Source | Valid? |
|---|---|---|
| ”Exercises on Monday, Wednesday, and Friday” | confirmed | Yes — directly stated |
| ”Reducing carbohydrate intake” | confirmed | Yes — directly stated |
| ”Implementing changes to exercise routine” | inferred | No — unsupported generalization |
| ”Motivated to improve health” | inferred | No — speculation |
When in doubt, the system errs on the side of not extracting. A missing inference can be captured in a future conversation, but a false inference pollutes the memory store and is harder to correct.
Temporal Encoding
Section titled “Temporal Encoding”Extracted facts are automatically enriched with a temporal suffix showing when they were learned. For example, a fact extracted from a session on March 15, 2026 would be stored as:
"Prefers TypeScript over JavaScript (mentioned 2026-03-15)"The pipeline derives the date from thread.lastMessageAt, which reflects the timestamp of the most recent message in the session. This temporal annotation enables:
- Temporal reasoning — the LLM can understand when facts were learned and reason about time-sensitive information (e.g., “you mentioned this 3 months ago”)
- Chronological retrieval — memories can be sorted by date, giving users and AI a timeline of how information evolved
Temporal encoding is applied automatically during the extraction pipeline. No configuration is needed.
See Temporal Encoding for a deeper look at how dates are derived and used in retrieval.
Reflection Pass
Section titled “Reflection Pass”Vitamem supports an optional reflection pass — a second LLM call that validates and enriches the facts produced by the initial extraction. When enabled, the reflection pass:
- Catches contradictions — flags facts that conflict with each other or with existing memories
- Enriches vague facts — adds specificity to facts that are too general (e.g., “exercises sometimes” becomes “exercises 3 times per week”)
- Detects missed information — identifies important facts that the initial extraction overlooked
Enable reflection in your config:
const mem = await createVitamem({ provider: "openai", apiKey: process.env.OPENAI_API_KEY!, storage: "ephemeral", enableReflection: true,});Reflection adds one additional LLM call per dormant transition. It is especially valuable for any domain where accuracy matters (health, finance, legal, education) and less critical for casual applications.
See Reflection for details on how the reflection prompt works and how to customize it.
What Gets Stored
Section titled “What Gets Stored”After extraction, each fact goes through the deduplication pipeline before being saved. Only facts that are genuinely new (not already represented in the user’s memory store) are persisted.
The final stored memory includes:
content— the fact textsource—confirmedorinferredembedding— the vector representation (e.g., 1536 dimensions fortext-embedding-3-small)userId— for retrieval scopingthreadId— for traceabilitypinned—trueif the memory matched an auto-pin rule (see above)tags— string array for categorization (e.g. the auto-pin rule name)lastRetrievedAt— timestamp of the last time this memory was returned by a retrieval query (used by active forgetting)retrievalCount— how many times this memory has been retrieved (helps distinguish frequently-accessed facts from stale ones)