ARIA Data Observatory — The Data Behind the Intelligence

Centerpiece

The Data Funnel — From Raw Data to Intelligence

The journey from raw sensor readings and photo metadata to personally relevant intelligence. Each layer refines, filters, and distills — turning hundreds of thousands of data points into a handful of genuinely useful insights.

Raw Ingestion ~800,000+ records

Everything flows in. iMessage history spanning years of conversations, photos from the camera roll, Google Voice archives, sensor data from iOS, calendar events, contacts, health readings, music libraries, HomeKit devices — a continuous stream of raw life data captured by the platform.

iMessages 567,090

Photos 139,000

Health 17,767

Playlist Tracks 8,788

ARIA Messages 1,567

Songs 1,091

Google Voice 1,002

Location 328

Activity 105

Contacts 88

Calendar 43

HomeKit 20

SMS 18

Email 10

▽ ▽ ▽

AI Analysis & Extraction ~27,000+ operations

Raw data is processed by AI models — 567,090 iMessages are analyzed in batches of 50 across 10,707 jobs, extracting facts about people and relationships. 412 Google Voice files (353 text conversations + 59 voicemails) are transcribed and analyzed. Photos are described by vision models, knowledge entities are extracted and linked, and context changes are detected across 15 sources every 5 minutes.

iMessages Analyzed 567,090

iMessage Analysis Jobs 10,707

Knowledge Backfill 5,989

Context Accumulation 4,379

Photos Described 3,505

Google Voice Processed 1,002

GV Voicemails Transcribed 59

Significance Checks 762

▽ ▽ ▽

Knowledge Synthesis ~360,000+ artifacts

Analysis output is refined into structured knowledge. Memory records are deduplicated (35% superseded), owner profile facts are organized across 9 domains, and knowledge graph entities are linked with typed relationships. The noise is stripped away; only verified intelligence remains.

Core Memories 202,751 → 131,897 active

Profile Facts 131,016

KG Facts 26,940

KG Entities 5,518

KG Summaries 27

▽ ▽ ▽

Intelligence Output the distillation

The final layer. Hundreds of thousands of data points reduced to carefully selected, deeply analyzed, personally relevant intelligence. What actually reaches the user is the tip of a massive data iceberg — each insight backed by the full weight of everything below it.

Proactive Insights 252

Context Snapshots 1,077

Journal Entries 90

KG Summaries 27

PRISM Personas 7

230,000+ raw records → 27,000+ analysis ops → 360,000+ knowledge artifacts → 252 delivered insights

Memory

Memory Architecture

202,751 total memory records distilled from every conversation, analysis run, and background job. After deduplication and supersession, 131,897 remain active — a 35% compression rate that keeps only the freshest, most relevant knowledge.

context

63,026 (48%)

people

34,452 (26%)

preferences

31,491 (24%)

recurring

2,385 (2%)

communication

383 (<1%)

◆

Deduplication in action. Of the 202,751 total memory records created, 70,854 have been superseded by newer, more accurate versions — a 35% compression rate. This continuous refinement ensures ARIA's working memory stays current and contradiction-free.

What Gets Memorized

Every conversation with ARIA can produce memory updates. Claude analyzes the dialogue and extracts durable facts — preferences ("prefers morning workouts"), people ("brother lives in Austin"), context ("working on product roadmap"), and recurring patterns ("reads every evening"). Background jobs also generate memories from iMessage analysis, photo descriptions, and health data.

Supersession Mechanics

When ARIA learns something that contradicts or updates an existing memory, it doesn't delete the old one — it supersedes it. The old record stays for audit trail purposes, marked with a pointer to its replacement. This versioning means ARIA can explain why she changed her understanding of a fact.

Knowledge

Knowledge Graph Domains

131,016 structured facts organized across 9 domains in the owner profile. Each fact has a confidence score, evidence chain, source attribution, and temporal validity. This is ARIA's deep understanding of the user's life.

events

62,161

people

34,204

preferences

31,258

lifestyle

2,417

communication

513

career

health

interests

places

◆

Events dominate. 47% of all knowledge facts relate to events — things that happened, when they happened, who was involved. This temporal awareness is what allows ARIA to answer "when did I last..." and "what was happening around..." questions with confidence.

Photos

The Photo Pipeline

139,000 photos in the iOS library. Each one passes through a multi-stage pipeline: metadata sync, AI description, entity extraction, and knowledge graph integration. The result is a searchable, queryable visual memory.

Stage 1

Photo Library

139,000

Total photos in
iOS library

Stage 2

Metadata Synced

63,296

Synced to ARIA
(46% complete)

Stage 3

AI Described

3,505

Vision model
analysis complete

Stage 4

Knowledge Fed

∞

Facts extracted
into KG

Photo Pipeline Throughput

Total in library 139,000

Metadata synced to ARIA 63,296 (45.5%)

Described by AI vision 3,505 (2.5%)

Awaiting sync ~75,704 remaining

LLM Engine

LLM Intelligence Engine

90,224 calls across three AI providers. The majority runs on Google's free tier for classification and vision, with Anthropic Claude handling deep reasoning and OpenAI powering embeddings.

90,224

total calls

Google Gemini
Free tier classification & vision

78,806 (87.3%)

Anthropic Claude
Reasoning, analysis, conversation — $13.43

7,944 (8.8%)

OpenAI
Embeddings & generation — $1.38

3,474 (3.9%)

Token Volume: 130 Million+

To put that in perspective: 130 million tokens is approximately 100 million words — equivalent to roughly 330 novels worth of text processed by AI models.

Each block = 1 novel (~300 pages)

        66 blocks shown × 5 = 330 novels = ~100M words = ~130M tokens
      

In Practice

How Raw Data Becomes Intelligence

Four composite examples showing the end-to-end data journey. Each demonstrates how disparate raw signals are combined, analyzed, and distilled into something genuinely useful.

Example 1 — Visual Memory

From a photo to a searchable memory

A photo taken at a restaurant with a friend is synced to ARIA. The AI vision model describes the scene — a candlelit dinner, two people, downtown restaurant visible through the window. Entity extraction identifies the location (dining district), the person (close friend M.), and the activity (dinner celebration). Knowledge facts are created and linked. When the user later asks "where did I eat last week?" — ARIA knows, complete with who was there and what the occasion was.

Photo synced → AI describes scene → Extract entities → Create KG facts → Update social graph → Queryable memory

Example 2 — Proactive Health

From sensor data to a wellness nudge

Health data shows a declining step count over two weeks. Calendar data simultaneously shows back-to-back meetings filling every afternoon. The context accumulator detects both changes. The significance gate confirms this is a meaningful pattern, not noise. The anticipation engine generates an insight: "Your daily activity has dropped 30% while your meeting load doubled. Consider blocking 30 minutes between meetings for a walk." Delivered via push notification at an optimal time.

Health data drops → Calendar overloaded → Context accumulator → Significance gate → Insight delivered

Example 3 — Social Intelligence

From iMessage threads to trip planning context

iMessage history analysis extracts that the user discussed vacation plans with three friends over the past month. The knowledge graph links each person to the trip entity, capturing discussed dates, proposed destinations, and logistical details. When the user asks ARIA about trip planning, she already knows who is going, what dates were discussed, which destinations came up, and even which friend suggested each option — all without the user needing to re-explain anything.

iMessage analysis → Extract trip facts → Link people & dates → KG trip entity → Rich context ready

Example 4 — Taste Profiling

From listening habits to personality understanding

Music data analysis reveals patterns: ambient and focus music during work hours, upbeat indie rock on weekends, jazz in the evenings. The music taste profile is auto-generated from 1,091 songs across 50 playlists. ARIA weaves this understanding into conversations naturally — referencing listening moods, suggesting music-related context, and understanding when the user mentions wanting something "for a chill evening" versus "something energizing."

1,091 songs synced → Pattern detection → Taste profile built → Context enrichment → Natural conversation

Jobs

The Background Engine

26,836 background jobs processed by aria-tempo. The silent workhorse that runs analysis pipelines, syncs data, generates insights, and maintains the knowledge graph — all autonomously, 24/7.

25,117

Completed

93.6% success

935

Failed

3.5% failure rate

784

Other States

pending / processing / cancelled

Top Job Types by Volume

iMessage analysis

10,707

knowledge backfill

5,989

context accumulate

4,379

looki realtime poll

1,703

photo describe

1,336

◆

iMessage analysis dominates. 567,090 individual messages analyzed across 10,707 jobs — nearly 40% of all background work — producing 121,516 knowledge facts (93% of the entire knowledge graph). Google Voice adds another 1,002 records (353 text conversations + 59 transcribed voicemails, yielding 931 facts). Conversational data is the richest source: each thread yields facts about people, preferences, plans, and relationships.

Economics

The Cost of Intelligence

All of this — 130 million tokens, 90,224 LLM calls, 26,836 background jobs, 202,751 memories, a 131,016-fact knowledge graph — costs a total of $14.81. Here's the breakdown.

$0.06

Cost Per Proactive Insight

$14.81 ÷ 252 insights

$0.0001

Cost Per Memory

$14.81 ÷ 131,897 active memories

$0.00016

Cost Per LLM Call

$14.81 ÷ 90,224 calls

Provider Cost Breakdown

Anthropic Claude $13.43 (90.7%)

7,944 calls · Deep reasoning, analysis, conversation, journal generation

OpenAI $1.38 (9.3%)

3,474 calls · Embeddings and text generation

Google Gemini $0.00 (free tier)

78,806 calls · Classification, significance checks, photo descriptions — all on free tier

✓

The free-tier multiplier. Google Gemini's free tier handles 87.3% of all LLM calls at zero cost. The significance gate alone filters 762 checks down to 252 actionable insights — a 67% noise reduction rate — preventing expensive Claude calls on data that doesn't matter. This architectural decision keeps the entire platform under $15 total.

Inventory

Complete Data Inventory

Every table in the system, organized by function. This is the full scope of what ARIA stores, processes, and reasons about.

Conversations & Messages

iMessages analyzed567,090

Google Voice records1,002

GV voicemails transcribed59

conversations242

ARIA messages1,567

event_log43,190

actions_log1,623

Knowledge & Memory

core_memory202,751

owner_profile131,016

knowledge_facts26,940

knowledge_entities5,518

knowledge_summaries27

Device Sync

photo_metadata63,296

health_data17,767

device_music_playlist_tracks8,788

device_music1,091

device_location328

device_activity105

contacts88

device_music_playlists50

device_calendar_events43

homekit_devices20

Intelligence & Operations

llm_usage90,224

tempo_jobs26,836

context_snapshots1,077

proactive_insights252

aria_journal90

social_connections27

twilio_messages18

aria_inbound_emails10

The Data Behind ARIA

The Numbers

The Data Funnel — From Raw Data to Intelligence

Memory Architecture

What Gets Memorized

Supersession Mechanics

Knowledge Graph Domains

The Photo Pipeline

Photo Library

Metadata Synced

AI Described

Knowledge Fed

LLM Intelligence Engine

Token Volume: 130 Million+

How Raw Data Becomes Intelligence

From a photo to a searchable memory

From sensor data to a wellness nudge

From iMessage threads to trip planning context

From listening habits to personality understanding

The Background Engine

The Cost of Intelligence

Provider Cost Breakdown

Complete Data Inventory

Conversations & Messages

Knowledge & Memory

Device Sync

Intelligence & Operations