This sample shows how to build an AI chatbot that remembers what a user
told it days, weeks, or months ago. Same primitive that powers the
/dashboard/chat agent on mnueron.com itself.
The shape
┌──────────────────────┐
user message ─► │ 1. Recall context │ ─► top-5 relevant memories + runbooks
├──────────────────────┤
│ 2. Call LLM with │
│ system prompt + │
│ recall + history │
├──────────────────────┤
│ 3. Save the user │
│ turn as a memory │
├──────────────────────┤
│ 4. Return reply │
└──────────────────────┘
Three round-trips: one to mnueron for recall, one to the LLM, one back to mnueron to save. Optional: a 4th to mnueron for save-on-assistant-turn if you want full conversation history searchable later.
Working code — TypeScript
import { Mnueron } from "@mnueron/sdk";
import OpenAI from "openai";
const mnueron = new Mnueron({
apiUrl: "https://www.mnueron.com",
apiToken: process.env.MNUERON_API_TOKEN!,
});
const openai = new OpenAI();
async function chat(userId: string, message: string): Promise<string> {
// 1. Recall
const hits = await mnueron.search({
query: message,
namespace: `user:${userId}`,
k: 5,
});
const context = hits
.map((h, i) => `[mem ${i + 1}] ${h.content}`)
.join("\n");
// 2. LLM
const completion = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content:
"You are a personal assistant with persistent memory. " +
"Below are facts you previously learned about this user. " +
"Use them when answering. If something contradicts a memory, " +
"tell the user and ask which is correct.\n\n" +
(context || "(no prior memories for this user yet)"),
},
{ role: "user", content: message },
],
});
const reply = completion.choices[0]?.message?.content ?? "";
// 3. Save the user's message as a memory (fire-and-forget)
mnueron
.save({
content: `User said: ${message}`,
namespace: `user:${userId}`,
source: "chatbot",
})
.catch(() => {});
return reply;
}
Working code — Python
from mnueron import Mnueron
import openai
m = Mnueron(api_url="https://www.mnueron.com",
api_token=os.environ["MNUERON_API_TOKEN"])
client = openai.OpenAI()
def chat(user_id: str, message: str) -> str:
# 1. Recall
hits = m.search(query=message,
namespace=f"user:{user_id}",
k=5)
context = "\n".join(f"[mem {i+1}] {h['content']}"
for i, h in enumerate(hits))
# 2. LLM
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"You are a personal assistant with persistent memory.\n"
"Below are facts you previously learned about this user.\n\n"
+ (context or "(no prior memories for this user yet)")
),
},
{"role": "user", "content": message},
],
)
reply = completion.choices[0].message.content or ""
# 3. Save the user's message
try:
m.save(content=f"User said: {message}",
namespace=f"user:{user_id}",
source="chatbot")
except Exception:
pass
return reply
Why this works
- Per-user namespace (
user:{userId}) keeps each user's memories isolated even though they share one mnueron tenant. - Recall before LLM means every reply is grounded in what was said before — no hallucinated "as I mentioned earlier."
- Save the user turn, not the assistant turn, by default. The assistant's words aren't usually facts you want to retrieve later; the user's are. (Save assistant turns when they contain decisions or actions the user agreed to.)
- Fire-and-forget save doesn't block the reply. If saving fails the chat still works; the memory loss is graceful.
Going further
- Auto-summarize long turns before saving — the auto-synopsis gate already does this on the server side when content > 800 chars.
- Surface runbooks by switching
searchto/api/recall/unified— when the user asks "how do I deploy?" they get a saved runbook back, not just facts. - Track conversation sessions by appending each turn to a
chat_sessionsrow — same shape as the dashboard chat agent does. - Detect entities by enabling entity extraction on save (paid tier) — then you can answer "what do we know about Maya?" with a structured entity card.
What to read next
- SDK overview — full method reference.
- Integrations overview — wiring the same memory into IDEs so it's shared across surfaces.
- Complete reference — every capability in one page.