Data Observatory
ARIA - Technical

The Data Behind ARIA

How 567,090 iMessages, 139,000 photos, 1,002 Google Voice records, and 130 million tokens become one intelligent assistant. A visualization of the raw scale of data that ARIA ingests, analyzes, and distills into personal intelligence.

The Numbers

Eight headline statistics that capture the scale of what ARIA processes. Every number represents real data from the production database — no projections, no estimates.

567K
iMessages Analyzed
across 10,707 analysis jobs
202,751
Memory Records
extracted & stored
139K
Photos
target at full sync
131,016
Knowledge Facts
121,516 from iMessage alone
130M+
Tokens Processed
by LLM providers
90,224
LLM Calls Made
across 3 providers
1,002
Google Voice Records
353 texts + 59 voicemails
$14.81
Total LLM Cost
for all of the above

The Data Funnel — From Raw Data to Intelligence

The journey from raw sensor readings and photo metadata to personally relevant intelligence. Each layer refines, filters, and distills — turning hundreds of thousands of data points into a handful of genuinely useful insights.

1
Raw Ingestion ~800,000+ records

Everything flows in. iMessage history spanning years of conversations, photos from the camera roll, Google Voice archives, sensor data from iOS, calendar events, contacts, health readings, music libraries, HomeKit devices — a continuous stream of raw life data captured by the platform.

iMessages 567,090
Photos 139,000
Health 17,767
Playlist Tracks 8,788
ARIA Messages 1,567
Songs 1,091
Google Voice 1,002
Location 328
Activity 105
Contacts 88
Calendar 43
HomeKit 20
SMS 18
Email 10
▽ ▽ ▽
2
AI Analysis & Extraction ~27,000+ operations

Raw data is processed by AI models — 567,090 iMessages are analyzed in batches of 50 across 10,707 jobs, extracting facts about people and relationships. 412 Google Voice files (353 text conversations + 59 voicemails) are transcribed and analyzed. Photos are described by vision models, knowledge entities are extracted and linked, and context changes are detected across 15 sources every 5 minutes.

iMessages Analyzed 567,090
iMessage Analysis Jobs 10,707
Knowledge Backfill 5,989
Context Accumulation 4,379
Photos Described 3,505
Google Voice Processed 1,002
GV Voicemails Transcribed 59
Significance Checks 762
▽ ▽ ▽
3
Knowledge Synthesis ~360,000+ artifacts

Analysis output is refined into structured knowledge. Memory records are deduplicated (35% superseded), owner profile facts are organized across 9 domains, and knowledge graph entities are linked with typed relationships. The noise is stripped away; only verified intelligence remains.

Core Memories 202,751 → 131,897 active
Profile Facts 131,016
KG Facts 26,940
KG Entities 5,518
KG Summaries 27
▽ ▽ ▽
4
Intelligence Output the distillation

The final layer. Hundreds of thousands of data points reduced to carefully selected, deeply analyzed, personally relevant intelligence. What actually reaches the user is the tip of a massive data iceberg — each insight backed by the full weight of everything below it.

Proactive Insights 252
Context Snapshots 1,077
Journal Entries 90
KG Summaries 27
PRISM Personas 7
230,000+ raw records → 27,000+ analysis ops → 360,000+ knowledge artifacts → 252 delivered insights
ARIA at her desk analyzing data with holographic displays

Memory Architecture

202,751 total memory records distilled from every conversation, analysis run, and background job. After deduplication and supersession, 131,897 remain active — a 35% compression rate that keeps only the freshest, most relevant knowledge.

context
63,026 (48%)
people
34,452 (26%)
preferences
31,491 (24%)
recurring
2,385 (2%)
communication
383 (<1%)
Deduplication in action. Of the 202,751 total memory records created, 70,854 have been superseded by newer, more accurate versions — a 35% compression rate. This continuous refinement ensures ARIA's working memory stays current and contradiction-free.

What Gets Memorized

Every conversation with ARIA can produce memory updates. Claude analyzes the dialogue and extracts durable facts — preferences ("prefers morning workouts"), people ("brother lives in Austin"), context ("working on product roadmap"), and recurring patterns ("reads every evening"). Background jobs also generate memories from iMessage analysis, photo descriptions, and health data.

Supersession Mechanics

When ARIA learns something that contradicts or updates an existing memory, it doesn't delete the old one — it supersedes it. The old record stays for audit trail purposes, marked with a pointer to its replacement. This versioning means ARIA can explain why she changed her understanding of a fact.

Knowledge Graph Domains

131,016 structured facts organized across 9 domains in the owner profile. Each fact has a confidence score, evidence chain, source attribution, and temporal validity. This is ARIA's deep understanding of the user's life.

events
62,161
people
34,204
preferences
31,258
lifestyle
2,417
communication
513
career
94
health
88
interests
82
places
41
Events dominate. 47% of all knowledge facts relate to events — things that happened, when they happened, who was involved. This temporal awareness is what allows ARIA to answer "when did I last..." and "what was happening around..." questions with confidence.

The Photo Pipeline

139,000 photos in the iOS library. Each one passes through a multi-stage pipeline: metadata sync, AI description, entity extraction, and knowledge graph integration. The result is a searchable, queryable visual memory.

Stage 1

Photo Library

139,000

Total photos in
iOS library

Stage 2

Metadata Synced

63,296

Synced to ARIA
(46% complete)

Stage 3

AI Described

3,505

Vision model
analysis complete

Stage 4

Knowledge Fed

Facts extracted
into KG

Photo Pipeline Throughput
Total in library 139,000
Metadata synced to ARIA 63,296 (45.5%)
Described by AI vision 3,505 (2.5%)
Awaiting sync ~75,704 remaining

LLM Intelligence Engine

90,224 calls across three AI providers. The majority runs on Google's free tier for classification and vision, with Anthropic Claude handling deep reasoning and OpenAI powering embeddings.

90,224
total calls
Google Gemini
Free tier classification & vision
78,806 (87.3%)
Anthropic Claude
Reasoning, analysis, conversation — $13.43
7,944 (8.8%)
OpenAI
Embeddings & generation — $1.38
3,474 (3.9%)

Token Volume: 130 Million+

To put that in perspective: 130 million tokens is approximately 100 million words — equivalent to roughly 330 novels worth of text processed by AI models.

Each block = 1 novel (~300 pages)
66 blocks shown × 5 = 330 novels = ~100M words = ~130M tokens

How Raw Data Becomes Intelligence

Four composite examples showing the end-to-end data journey. Each demonstrates how disparate raw signals are combined, analyzed, and distilled into something genuinely useful.

Example 1 — Visual Memory

From a photo to a searchable memory

A photo taken at a restaurant with a friend is synced to ARIA. The AI vision model describes the scene — a candlelit dinner, two people, downtown restaurant visible through the window. Entity extraction identifies the location (dining district), the person (close friend M.), and the activity (dinner celebration). Knowledge facts are created and linked. When the user later asks "where did I eat last week?" — ARIA knows, complete with who was there and what the occasion was.

Photo synced AI describes scene Extract entities Create KG facts Update social graph Queryable memory
Example 2 — Proactive Health

From sensor data to a wellness nudge

Health data shows a declining step count over two weeks. Calendar data simultaneously shows back-to-back meetings filling every afternoon. The context accumulator detects both changes. The significance gate confirms this is a meaningful pattern, not noise. The anticipation engine generates an insight: "Your daily activity has dropped 30% while your meeting load doubled. Consider blocking 30 minutes between meetings for a walk." Delivered via push notification at an optimal time.

Health data drops Calendar overloaded Context accumulator Significance gate Insight delivered
Example 3 — Social Intelligence

From iMessage threads to trip planning context

iMessage history analysis extracts that the user discussed vacation plans with three friends over the past month. The knowledge graph links each person to the trip entity, capturing discussed dates, proposed destinations, and logistical details. When the user asks ARIA about trip planning, she already knows who is going, what dates were discussed, which destinations came up, and even which friend suggested each option — all without the user needing to re-explain anything.

iMessage analysis Extract trip facts Link people & dates KG trip entity Rich context ready
Example 4 — Taste Profiling

From listening habits to personality understanding

Music data analysis reveals patterns: ambient and focus music during work hours, upbeat indie rock on weekends, jazz in the evenings. The music taste profile is auto-generated from 1,091 songs across 50 playlists. ARIA weaves this understanding into conversations naturally — referencing listening moods, suggesting music-related context, and understanding when the user mentions wanting something "for a chill evening" versus "something energizing."

1,091 songs synced Pattern detection Taste profile built Context enrichment Natural conversation

The Background Engine

26,836 background jobs processed by aria-tempo. The silent workhorse that runs analysis pipelines, syncs data, generates insights, and maintains the knowledge graph — all autonomously, 24/7.

25,117
Completed
93.6% success
935
Failed
3.5% failure rate
784
Other States
pending / processing / cancelled
Top Job Types by Volume
iMessage analysis
10,707
knowledge backfill
5,989
context accumulate
4,379
looki realtime poll
1,703
photo describe
1,336
iMessage analysis dominates. 567,090 individual messages analyzed across 10,707 jobs — nearly 40% of all background work — producing 121,516 knowledge facts (93% of the entire knowledge graph). Google Voice adds another 1,002 records (353 text conversations + 59 transcribed voicemails, yielding 931 facts). Conversational data is the richest source: each thread yields facts about people, preferences, plans, and relationships.

The Cost of Intelligence

All of this — 130 million tokens, 90,224 LLM calls, 26,836 background jobs, 202,751 memories, a 131,016-fact knowledge graph — costs a total of $14.81. Here's the breakdown.

$0.06
Cost Per Proactive Insight
$14.81 ÷ 252 insights
$0.0001
Cost Per Memory
$14.81 ÷ 131,897 active memories
$0.00016
Cost Per LLM Call
$14.81 ÷ 90,224 calls

Provider Cost Breakdown

Anthropic Claude $13.43 (90.7%)
7,944 calls · Deep reasoning, analysis, conversation, journal generation
OpenAI $1.38 (9.3%)
3,474 calls · Embeddings and text generation
Google Gemini $0.00 (free tier)
78,806 calls · Classification, significance checks, photo descriptions — all on free tier
The free-tier multiplier. Google Gemini's free tier handles 87.3% of all LLM calls at zero cost. The significance gate alone filters 762 checks down to 252 actionable insights — a 67% noise reduction rate — preventing expensive Claude calls on data that doesn't matter. This architectural decision keeps the entire platform under $15 total.

Complete Data Inventory

Every table in the system, organized by function. This is the full scope of what ARIA stores, processes, and reasons about.

Conversations & Messages

iMessages analyzed567,090
Google Voice records1,002
GV voicemails transcribed59
conversations242
ARIA messages1,567
event_log43,190
actions_log1,623

Knowledge & Memory

core_memory202,751
owner_profile131,016
knowledge_facts26,940
knowledge_entities5,518
knowledge_summaries27

Device Sync

photo_metadata63,296
health_data17,767
device_music_playlist_tracks8,788
device_music1,091
device_location328
device_activity105
contacts88
device_music_playlists50
device_calendar_events43
homekit_devices20

Intelligence & Operations

llm_usage90,224
tempo_jobs26,836
context_snapshots1,077
proactive_insights252
aria_journal90
social_connections27
twilio_messages18
aria_inbound_emails10