A

ARIA

Adaptive Responsive Intelligence Assistant
Technical Deep Dive
ARIA - Casual at home

HomePod Voice System

ARIA's "Speak to HomePod" feature enables Claude to announce messages through Apple HomePod speakers using a queue-based relay architecture. The system bridges ARIA's cloud services with local Apple hardware through a Mac relay daemon, leveraging pyatv's RAOP (Remote Audio Output Protocol) implementation to stream synthesized speech via AirPlay.

The relay pattern is shared with ARIA's iMessage and Apple TV integrations: tool call → Tempo job → database queue → Mac relay polls → local execution → completion callback.

Architecture Overview

End-to-End Message Flow

1. User: "Announce dinner is ready on the kitchen HomePod"
2. Claude calls homepod_announce tool homepod-tools.ts
3. Enqueues SPEAK_HOMEPOD job via Tempo (priority 10)
4. speak-homepod.ts handler INSERT into homepod_queue
5. Mac relay daemon polls GET /api/homepod/pending every ~5s
6. Row claimed (status: pending claimed)
7. macOS say command generates AIFF audio file
8. pyatv atvremote stream_file sends audio to HomePod via RAOP
9. Relay reports POST /api/homepod/[id]/complete
10. Row finalized (status: claimed sent)

Priority 10 ensures announcements bypass lower-priority background jobs. This was a specific fix after announcements were getting stuck behind long-running analytics jobs.

Component Inventory

File Map Across Repositories

RepositoryFileRole
ariasrc/lib/homepod-tools.tsTool definition + executor; enqueues SPEAK_HOMEPOD job
ariasrc/app/api/homepod/pending/route.tsGET endpoint; relay fetches pending announcements
ariasrc/app/api/homepod/[id]/complete/route.tsPOST endpoint; relay reports delivery success/failure
ariasrc/lib/tool-dispatch.tsRoutes homepod_* tool calls to executor
ariasql/041_homepod_queue.sqlMigration creating homepod_queue table
aria-temposrc/handlers/speak-homepod.tsInserts into queue, logs event
aria-tempo-clientsrc/constants.tsJOB_TYPE.SPEAK_HOMEPOD constant
aria-tempo-clientsrc/types.tsSpeakHomepodPayload interface
Claude Tool Interface

homepod_announce Tool Definition

Claude receives this tool as part of its available toolkit in every conversation. The tool description guides Claude on when and how to use HomePod announcements:

// Tool: homepod_announce
// Description:
"Announce a message through HomePod speakers.
 The message is spoken aloud via the Intercom feature.
 Use for time-sensitive notifications, reminders, or
 when the user is likely away from their phone/computer.
 Keep messages concise and conversational."

// Input Schema:
{
  message: string,   // Required: text to speak aloud
  target?: string    // Optional: HomePod name (e.g., "Living Room")
}                    // Omit target to announce on all HomePods

The tool executor in homepod-tools.ts enqueues the job at priority 10 and returns immediately with a confirmation. The user sees "Announcing on HomePod..." in the chat interface while delivery happens asynchronously.

Database Schema

homepod_queue Table (Migration 041)

CREATE TABLE homepod_queue (
  id         SERIAL PRIMARY KEY,
  message    TEXT NOT NULL,
  target     TEXT,             -- HomePod name; NULL = all
  status     TEXT NOT NULL DEFAULT 'pending'
             CHECK (status IN (
               'pending', 'claimed',
               'sent', 'failed'
             )),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  claimed_at TIMESTAMPTZ,
  sent_at    TIMESTAMPTZ,
  error      TEXT
);

Status Lifecycle

StatusSet ByMeaning
pendingspeak-homepod handlerWaiting for relay to claim
claimedGET /api/homepod/pendingRelay has picked up the row; claimed_at timestamp set
sentPOST /api/homepod/[id]/completeAudio successfully streamed to HomePod
failedPOST /api/homepod/[id]/completeDelivery failed; error column populated

Stale Claim Recovery

The pending endpoint automatically resets claimed rows older than 2 minutes back to pending on each poll. This handles relay crashes, network interruptions, or pyatv timeouts without manual intervention.

API Endpoints

GET /api/homepod/pending

Authenticated via RELAY_AUTH_TOKEN Bearer header (shared with iMessage relay).

Resets stale claims, then returns up to 10 pending rows using FOR UPDATE SKIP LOCKED for safe concurrent access. Sets status to claimed and records claimed_at timestamp.

// Response
{
  "announcements": [{
    "id": 42,
    "message": "Dinner is ready",
    "target": "Kitchen",
    "created_at": "2026-03-22T..."
  }]
}

POST /api/homepod/[id]/complete

Called by relay after each announcement attempt. Accepts success boolean and optional error string.

Returns 404 if the row does not exist or is not in claimed state, preventing duplicate completion reports.

// Request body
{
  "success": true,
  "error"?: "pyatv timeout"
}

// Sets status = 'sent' | 'failed'
// Sets sent_at = NOW()
// Stores error text if failed
ARIA in a cozy coffee shop setting
Tempo Worker Handler

speak-homepod.ts

The Tempo handler is deliberately minimal. Its only job is to bridge the Tempo job system to the relay queue, keeping the hot path fast:

async function handleSpeakHomepod(pool, payload) {
  const { message, target } = payload;

  // Insert into relay queue
  const result = await pool.query(
    `INSERT INTO homepod_queue (message, target)
     VALUES ($1, $2) RETURNING id`,
    [message, target || null]
  );

  // Log to event_log for observability
  await logEvent(pool, {
    component: 'homepod',
    category: 'interaction',
    action: `HomePod announce queued`
      + (target ? ` to ${target}` : '')
      + `: "${message.slice(0, 80)}"`
  });

  return { queued: true, queue_id: result.rows[0].id };
}

Message text is truncated to 80 characters in the event log to keep observability data manageable. The full message is preserved in the homepod_queue row.

Mac Relay Implementation

Local Execution: macOS say + pyatv

The Mac relay daemon runs on a Mac that shares the local network with HomePod speakers. It polls the ARIA API for pending announcements and executes them locally using two system-level tools:

Step 1: Text-to-Speech Generation

# macOS built-in TTS (Siri voice)
say -o /tmp/aria-speak-<uuid>.aiff "Dinner is ready"

The say command uses the system's default Siri voice to generate an AIFF audio file. The UUID-based filename prevents collisions when multiple announcements are queued.

Step 2: AirPlay Streaming via pyatv

# Stream to specific HomePod
atvremote -n "Kitchen" stream_file=/tmp/aria-speak-<uuid>.aiff

# Stream to first discovered HomePod (no target)
atvremote stream_file=/tmp/aria-speak-<uuid>.aiff

pyatv discovers HomePods via mDNS/Bonjour and streams audio using RAOP (AirPlay's audio protocol). No pairing is required for RAOP streaming to HomePods.

Prerequisites

pip3 install pyatv atvremote scan macOS 13+ Same network as HomePods
Design Decisions

Why Queue-Based Relay?

HomePods are not addressable from the internet. They only accept connections from devices on the same local network via mDNS discovery. ARIA's cloud services cannot reach them directly.

The relay pattern solves this by inverting the connection: the Mac relay pulls work from the cloud API rather than the cloud pushing to local hardware. This avoids NAT traversal, dynamic DNS, or VPN tunneling.

Why pyatv over Shortcuts?

The original implementation used Apple Shortcuts' Intercom/Announce action. This was abandoned because the Intercom action requires iOS 16.4+ and is not available in macOS Shortcuts.

pyatv provides a stable, open-source AirPlay implementation that works on macOS without any Apple framework dependencies. RAOP streaming is marked as "early stage" in pyatv but has proven reliable for speech-length audio.

Security & Authentication

Authentication Model

The HomePod API endpoints share the RELAY_AUTH_TOKEN Bearer token with the iMessage relay system. This is a static token stored in the container environment and configured on the Mac relay daemon.

LayerMechanismScope
API AuthenticationBearer token (RELAY_AUTH_TOKEN)Relay ↔ ARIA API
Database LockingFOR UPDATE SKIP LOCKEDPrevents duplicate claims
Stale Recovery2-minute claim timeoutAuto-recovery from relay failures
Completion Guard404 on non-claimed rowsPrevents duplicate completion reports
NetworkRAOP (no pairing needed)Mac relay ↔ HomePod (LAN only)

The relay token is shared with iMessage because both relay systems run on the same Mac daemon and poll the same ARIA API. A dedicated HomePod token could be added if the relay is split to a separate host.

Observability

Monitoring & Debugging

The HomePod system is observable at three levels:

1

Tempo Job Log — Every SPEAK_HOMEPOD job is tracked in tempo_jobs with status, duration, and error fields. Failed jobs appear in the ARIA Tempo dashboard.

2

Event Log — The handler writes to event_log with component 'homepod' and category 'interaction', enabling filtering in the analytics dashboard.

3

Queue Table — Direct queries against homepod_queue show delivery status, latency (created_at to sent_at), failure rates, and error messages.

-- Delivery latency for last 24 hours
SELECT
  AVG(EXTRACT(EPOCH FROM sent_at - created_at)) AS avg_seconds,
  COUNT(*) FILTER (WHERE status = 'sent') AS delivered,
  COUNT(*) FILTER (WHERE status = 'failed') AS failed
FROM homepod_queue
WHERE created_at > NOW() - INTERVAL '24 hours';
Current Limitations

Phase 1 Constraints

LimitationImpactPlanned Resolution
Siri voice onlyNo custom ARIA voice identityPhase 2: ElevenLabs TTS integration
~7-10s total latency5s poll interval + TTS + AirPlay bufferWebSocket relay or shorter poll interval
Text only, no SSMLNo prosody control (emphasis, pauses)Phase 2: SSML via ElevenLabs
One-way onlyHomePod cannot respond back to ARIAPhase 4: Conversational mode
Requires Mac relayAlways-on Mac needed on same LANUnder investigation: HomePod direct API
pyatv RAOP "early stage"Occasional streaming failuresStale claim recovery handles retries
Roadmap

Phase 1: pyatv + macOS say

Active

Current implementation. Queue-based relay with macOS native TTS and pyatv RAOP streaming. Functional for basic announcements with Siri voice.

Phase 2: ElevenLabs Voice + AirPlay Routing

Planned

Replace macOS say with ElevenLabs TTS API to give ARIA a custom voice identity. Generated MP3 is routed to HomePod via one of three approaches:

ApproachComplexityNotes
SwitchAudioSource CLILowSimplest; switches Mac audio output to HomePod AirPlay, plays with afplay
CoreAudio APIMediumAudioObjectSetPropertyData for programmatic output device selection
AVFoundation AirPlayHighDirect AirPlay routing without switching system audio

Estimated cost: ~$0.03/announcement at 100 characters. Migration adds a voice TEXT DEFAULT 'siri' column to homepod_queue for backward compatibility.

Phase 3: Apple TV Video Announcements

Planned

Extend to visual announcements on Apple TV using pyatv's play_url mechanism. AI-generated avatar video synced with ElevenLabs speech output.

HeyGen API SadTalker + ElevenLabs D-ID Kling 3.0 ElevenLabs Image & Video

Target output: 1080p MP4 H.264. Same relay pattern, different media type and delivery endpoint.

Phase 4: Conversational & Multi-Room

Future

Full two-way conversational mode, multi-room targeting with room-aware context, priority levels for announcements, music integration (pause/resume around announcements), and a wake word trigger ("Hey ARIA") via always-on microphone.

ARIA HomePod Voice System • Technical Documentation • March 2026
Generated from source code analysis across aria, aria-tempo, aria-tempo-client, and aria-ios repositories

ARIA

Adaptive Responsive Intelligence Assistant

HomePod Voice System • Queue-Based Relay Architecture

Overview Evolution Tempo SSE Engine PRISM Data Observatory CAFT & PIE PKE & KG