How ARIA builds a comprehensive, structured understanding of your life — extracting entities and facts from conversations, photos, emails, and proactive intelligence — and uses it to serve you better over time.
The Personal Knowledge Engine organizes understanding across three layers — from raw entities and facts, to semantic summaries, to a compact knowledge map. Each layer serves a different use case and query pattern.
The foundation. Every person, place, topic, event, and thing ARIA knows about is an entity. Every piece of knowledge about those entities is a fact with confidence scoring, temporal validity, evidence tracking, and vector embeddings for semantic search. Facts link to entities via many-to-many relationships with typed roles (subject, object, location, topic).
AI-generated narrative summaries that synthesize raw facts into human-readable profiles. Four types: entity profiles (person, place, topic), domain overviews (health, career, preferences), relationship summaries (Nic & Ryan), and life chapters (Career: 2020–2026). Marked stale when new facts arrive.
A compact, pre-computed view of "what ARIA knows" — entity names ranked by fact count. Provides instant lookup without scanning the full graph. Refreshed automatically after summary generation cycles.
Every named concept in ARIA's understanding is an entity. Entities have a canonical name, optional aliases (phone numbers, emails, nicknames), and are linked to the unified contacts table when applicable.
People in your life. Linked to contacts. Aliases for phone/email.
Cities, landmarks, restaurants. Metadata for GPS coordinates.
Career, hobbies, interests. Abstract concepts and domains.
Birthdays, trips, milestones. Temporal bounds tracked.
Objects, devices, pets. Anything physical or conceptual.
Facts are the atomic units of knowledge. Each fact is a structured statement about the world with rich metadata — confidence, evidence, temporal bounds, semantic embedding, and version history.
Eight knowledge domains organize facts by topic.
Facts link to entities via typed many-to-many relationships.
superseded_by pointer links old→new, maintaining a complete audit trail. Queries filter with WHERE superseded_by IS NULL for current facts.When new information arrives, ARIA must determine whether it refers to an existing entity or a new one. The resolution system uses a confidence-ordered cascade of matching signals — from definitive phone number matches to cautious fuzzy name comparisons.
| Signal | Threshold | Action | Example |
|---|---|---|---|
| Phone number | Definitive | Auto-merge, upgrade name | +1 (617) 555-1234 → 6175551234 |
| Contact ID | Definitive | Auto-merge (from iOS) | Linked via unified contacts table |
| Exact name | Definitive | Auto-merge (case-insensitive) | "alex thompson" = "Alex Thompson" |
| Alias match | Definitive | Auto-merge (nicknames, emails) | "AT" in aliases → Alex Thompson |
| Fuzzy name (high) | ≥ 0.85 | Auto-merge (bigram similarity) | "Jon Smith" ≈ "John Smith" |
| Fuzzy name (low) | 0.55 – 0.85 | Create new + flag for review | Near-match → merge queue |
| No match | — | Create new entity | Brand new person/place/thing |
Knowledge flows into the graph from six primary sources. Each source has a specialized handler that extracts structured facts, resolves entities, and generates embeddings.
Raw data from iMessage, photos, email, voice, chat, or imports.
Claude identifies entities, facts, relationships, and temporal context.
Match to existing entities or create new ones. Flag near-matches.
Deduplicate, version, link, and generate vector embeddings.
Processes conversation history in watermarked batches. Extracts people, preferences, life events, and recurring patterns. Identity-aware — distinguishes your facts from others'.
Clusters photos by date and GPS proximity (~50km). Extracts places visited, trips, social events, activities, food preferences, and relationship patterns.
Parses Gmail Takeout .mbox archives. Filters spam, trash, and promotions. Groups by thread. Ingests as conversation chunks with entity resolution.
Transcribes audio files via OpenAI Whisper or AWS Transcribe. Extracts facts from transcripts with full entity resolution.
Ongoing conversations with ARIA are a signal source. Memory updates extracted from Claude's responses feed into core memory and eventually the KG.
Proactive Intelligence Engine detects behavioral patterns. High-confidence patterns are promoted to core memory, then backfilled into the knowledge graph.
Not all knowledge requires the same embedding model. The system uses a content-aware router that selects the optimal model based on source type, balancing cost, dimensionality, and search quality.
| Content Class | Model | Provider | Dimensions | Used For |
|---|---|---|---|---|
| Facts | text-embedding-004 | 768 | Knowledge facts (compact, cost-effective) | |
| Conversations | text-embedding-3-large | OpenAI | 1536 | iMessage and email conversation chunks |
| text-embedding-3-large | OpenAI | 1536 | Email thread ingestion | |
| Transcripts | text-embedding-3-large | OpenAI | 1536 | Voice memo transcription |
| Journal | text-embedding-3-large | OpenAI | 1536 | ARIA's reflective journal entries |
When a new fact arrives — from any source — it passes through a six-step pipeline that handles deduplication, confidence updating, entity linking, and embedding generation.
Only one active fact per (domain, category, key) combination. Enforced by a unique index filtered on superseded_by IS NULL. Old versions are preserved for audit.
Every 6 hours, the knowledge-summarize handler generates AI-written narrative summaries from raw facts. Four summary types serve different query patterns.
100–300 word profiles for people, places, and topics. Summarizes all linked facts grouped by domain with confidence signals.
150–300 word snapshots of entire knowledge domains. Generated when a domain has 10+ facts.
200–400 word relationship profiles for top people by fact count. Includes conversation chunk statistics.
Temporal narrative summaries spanning significant periods — career transitions, relocations, relationship milestones.
stale = true. The next summarize cycle regenerates stale summaries first, ensuring knowledge stays current without redundant work.ARIA's knowledge system has evolved through three generations. Each builds on the last — the Knowledge Graph doesn't replace older systems, it unifies them.
Freeform (category, key, value) tuples. No confidence. No entities. Injected into every Claude request as system context. Still active and read by PIE Gate 1.
Structured facts with confidence, evidence, sources, and temporal validity. Domain/category/key organization. Written by iMessage and photo analysis handlers.
Full entity-fact graph with semantic embeddings, version history, many-to-many entity links, AI summaries, and merge queue. Unifies all prior generations.
The Proactive Intelligence Engine and the Knowledge Graph form a bidirectional feedback loop. PIE monitors for changes and detects patterns; the Knowledge Graph stores and organizes what's learned. Each makes the other more effective over time.
core_memoryknowledge_factscore_memory for changesARIA can query the knowledge graph during conversations using three specialized tools. These enable semantic search, entity lookup, and graph exploration.
Natural language semantic search across facts and conversation chunks. Filters by entity type and domain. Returns ranked results by vector similarity.
Detailed profile for a specific entity. Includes all linked facts, summaries, and conversation statistics.
Browse all entities with filtering and sorting. Useful for "who does ARIA know about" or "what places are tracked."
Same last name + different first name = 0.45 score (below auto-merge). Prevents accidentally merging family members who share a surname.
All extracted facts require "As of DATE..." format. A fact without temporal context becomes stale without anyone knowing.
Facts use compact 768-dim vectors (cheaper). Conversations use 1536-dim for higher semantic precision where nuance matters most.
All analysis handlers use resumable watermarks. They self-chain through history — process a batch, save progress, enqueue the next batch.
LLM extraction prompts distinguish whose fact it is. "Mom had surgery" → fact about Nic's mother, not about Nic.
Facts are superseded, never deleted. Old versions remain for audit trail. Current state filtered with superseded_by IS NULL.