OpenAI

Installation

Install the Vitamem package and the OpenAI peer dependency:

npm install vitamem openai

Quick Setup

The fastest way to get started with OpenAI:

import { createVitamem } from "vitamem";

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
});

This uses the default models: gpt-5.4-mini for chat and extraction, text-embedding-3-small for embeddings.

Adapter Factory

For full control, use createOpenAIAdapter directly:

import { createOpenAIAdapter, createVitamem } from "vitamem";

const llm = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  chatModel: "gpt-5.4-mini",
  embeddingModel: "text-embedding-3-small",
});

const mem = await createVitamem({
  llm,
  storage: "ephemeral",
});

Options

Option	Type	Default	Description
`apiKey`	`string`	required	Your OpenAI API key.
`chatModel`	`string`	`"gpt-5.4-mini"`	Model used for chat completions and memory extraction.
`embeddingModel`	`string`	`"text-embedding-3-small"`	Model used for text embeddings.
`baseUrl`	`string`	`undefined`	Override the API base URL (see below).
`apiMode`	`"completions"` \| `"responses"`	`"completions"`	Which OpenAI API shape to use (see API Mode).
`extraChatOptions`	`object`	`undefined`	Provider-specific options spread into every chat/extraction call (see Pass-through Options).
`extraEmbeddingOptions`	`object`	`undefined`	Provider-specific options spread into every embedding call.
`extractionPrompt`	`string`	Built-in prompt	Custom prompt for memory extraction. Must include a `{conversation}` placeholder.

API Mode

Vitamem supports two OpenAI API modes:

completions (default) — Uses /v1/chat/completions. Compatible with OpenAI, DashScope, Azure OpenAI, Groq, Together AI, and other OpenAI-compatible providers.
responses — Uses OpenAI’s newer Responses API (/v1/responses) with extended features.

const adapter = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  apiMode: 'responses',  // Use Responses API
});

Or via environment variable:

OPENAI_API_MODE=responses

Pass-through Options

Pass any provider-specific options directly to the underlying SDK call using extraChatOptions and extraEmbeddingOptions. These are spread into the SDK request, so you can use any option your provider supports without waiting for explicit Vitamem support.

const adapter = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  extraChatOptions: {
    temperature: 0.7,
    max_tokens: 1024,
  },
  extraEmbeddingOptions: {
    dimensions: 512,
  },
});

Or via environment variables (JSON format):

OPENAI_EXTRA_CHAT_OPTIONS={"temperature": 0.7, "max_tokens": 1024}
OPENAI_EXTRA_EMBEDDING_OPTIONS={"dimensions": 512}

Choosing Models

Chat Models

Any model available through the OpenAI chat completions API works.

const llm = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  chatModel: "gpt-5.4-mini",
});

Embedding Models

The embedding model determines the vector dimensions stored in your database. If you change the embedding model after memories have been saved, existing embeddings will not be compatible with new ones.

Model	Dimensions	Notes
`text-embedding-3-small`	1536	Default. Good quality at low cost.
`text-embedding-3-large`	3072	Higher accuracy, double the storage.

Using with OpenAI-Compatible APIs

The baseUrl option lets you point the adapter at any API that implements the OpenAI chat completions and embeddings endpoints. This is useful for proxies, self-hosted models, or enterprise gateways.

DashScope (Alibaba Cloud)

DashScope provides an OpenAI-compatible endpoint that supports both chat completions (Qwen models) and embeddings.

const llm = createOpenAIAdapter({
  apiKey: process.env.DASHSCOPE_API_KEY!,
  chatModel: "qwen3.5-flash",
  embeddingModel: "text-embedding-v4",
  baseUrl: "https://dashscope.aliyuncs.com/compatible-mode/v1",
});

Or configure entirely via environment variables:

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-dashscope-key
OPENAI_CHAT_MODEL=qwen3.5-flash
OPENAI_EMBEDDING_MODEL=text-embedding-v4
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
OPENAI_API_MODE=completions

DashScope enable_thinking

With streaming enabled, DashScope’s enable_thinking option works via extraChatOptions:

const adapter = createOpenAIAdapter({
  apiKey: process.env.DASHSCOPE_API_KEY!,
  chatModel: 'qwen3.5-flash',
  baseUrl: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  extraChatOptions: { enable_thinking: true },
});
// Use chatStream() — enable_thinking requires streaming

Important: enable_thinking only works with streaming (chatStream). Non-streaming chat() will error.

Recommended DashScope models:

Purpose	Model	Notes
Chat / Extraction	`qwen3.5-flash`	Fast, cost-effective.
Chat / Extraction	`qwen-plus`	Higher accuracy.
Embeddings	`text-embedding-v4`	Use via the compatible-mode URL.

Azure OpenAI

const llm = createOpenAIAdapter({
  apiKey: process.env.AZURE_OPENAI_API_KEY!,
  baseUrl: "https://your-resource.openai.azure.com/openai/deployments/your-deployment",
  chatModel: "gpt-4o",
});

Groq

const llm = createOpenAIAdapter({
  apiKey: process.env.GROQ_API_KEY!,
  chatModel: "llama-3.3-70b-versatile",
  baseUrl: "https://api.groq.com/openai/v1",
});

LM Studio

const llm = createOpenAIAdapter({
  apiKey: "lm-studio", // LM Studio does not require a real key
  baseUrl: "http://localhost:1234/v1",
  chatModel: "your-loaded-model",
  embeddingModel: "your-embedding-model",
});

vLLM

const llm = createOpenAIAdapter({
  apiKey: "vllm",
  baseUrl: "http://localhost:8000/v1",
  chatModel: "meta-llama/Llama-3-8b-chat-hf",
});

Streaming

The OpenAI adapter supports streaming in both completions and responses API modes. Use chatStream() or chatWithUserStream() on the Vitamem instance to receive tokens as they are generated:

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  storage: "ephemeral",
});

const { stream } = await mem.chatStream({
  threadId: thread.id,
  message: "What should I know about my medications?",
});

for await (const chunk of stream) {
  process.stdout.write(chunk);
}

See Streaming Output for the full guide, including SSE integration and fallback behavior.

Custom Extraction Prompt

The extraction prompt controls how the LLM identifies facts worth remembering from a conversation. The default prompt is tuned for health companions, focusing on conditions, medications, lifestyle, and goals.

To customize it, pass an extractionPrompt string. It must include the {conversation} placeholder, which Vitamem replaces with the formatted message history.

const llm = createOpenAIAdapter({
  apiKey: process.env.OPENAI_API_KEY!,
  extractionPrompt: `Extract health-related facts from this conversation.
Focus on: diagnoses, medications, vitals, and care preferences.

Conversation:
{conversation}

Return a JSON array only (no markdown, no explanation):
[{ "content": "brief factual statement", "source": "confirmed" | "inferred" }]`,
});

Config Shortcut

When using the string shortcut via createVitamem, you can override models with the model and embeddingModel fields:

const mem = await createVitamem({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
  model: "gpt-4o",
  embeddingModel: "text-embedding-3-large",
  storage: "ephemeral",
});

Peer Dependency

The OpenAI adapter lazy-loads the openai SDK at runtime. It is listed as an optional peer dependency of Vitamem:

{
  "peerDependencies": {
    "openai": ">=4.0.0"
  }
}

If the openai package is not installed and you attempt to use the OpenAI adapter, you will get a module resolution error at runtime.

Next Steps

Anthropic Provider — use Claude for chat with OpenAI embeddings
Ollama Provider — run models locally with zero config
Custom LLM Adapter — implement the interface for any provider