ARIA - HomePod Voice System

HomePod Voice System

ARIA's "Speak to HomePod" feature enables Claude to announce messages through Apple HomePod speakers using a queue-based relay architecture. The system bridges ARIA's cloud services with local Apple hardware through a Mac relay daemon, leveraging pyatv's RAOP (Remote Audio Output Protocol) implementation to stream synthesized speech via AirPlay.

The relay pattern is shared with ARIA's iMessage and Apple TV integrations: tool call → Tempo job → database queue → Mac relay polls → local execution → completion callback.

Architecture Overview

End-to-End Message Flow

1. User: "Announce dinner is ready on the kitchen HomePod"

2. Claude calls homepod_announce tool → homepod-tools.ts

3. Enqueues SPEAK_HOMEPOD job via Tempo (priority 10)

4. speak-homepod.ts handler → INSERT into homepod_queue

5. Mac relay daemon polls GET /api/homepod/pending every ~5s

6. Row claimed (status: pending → claimed)

7. macOS say command generates AIFF audio file

8. pyatv atvremote stream_file sends audio to HomePod via RAOP

9. Relay reports POST /api/homepod/[id]/complete

10. Row finalized (status: claimed → sent)

Priority 10 ensures announcements bypass lower-priority background jobs. This was a specific fix after announcements were getting stuck behind long-running analytics jobs.

Component Inventory

File Map Across Repositories

Repository	File	Role
`aria`	`src/lib/homepod-tools.ts`	Tool definition + executor; enqueues SPEAK_HOMEPOD job
`aria`	`src/app/api/homepod/pending/route.ts`	GET endpoint; relay fetches pending announcements
`aria`	`src/app/api/homepod/[id]/complete/route.ts`	POST endpoint; relay reports delivery success/failure
`aria`	`src/lib/tool-dispatch.ts`	Routes homepod_* tool calls to executor
`aria`	`sql/041_homepod_queue.sql`	Migration creating homepod_queue table
`aria-tempo`	`src/handlers/speak-homepod.ts`	Inserts into queue, logs event
`aria-tempo-client`	`src/constants.ts`	JOB_TYPE.SPEAK_HOMEPOD constant
`aria-tempo-client`	`src/types.ts`	SpeakHomepodPayload interface

Claude Tool Interface

homepod_announce Tool Definition

Claude receives this tool as part of its available toolkit in every conversation. The tool description guides Claude on when and how to use HomePod announcements:

// Tool: homepod_announce
// Description:
"Announce a message through HomePod speakers.
 The message is spoken aloud via the Intercom feature.
 Use for time-sensitive notifications, reminders, or
 when the user is likely away from their phone/computer.
 Keep messages concise and conversational."

// Input Schema:
{
  message: string,   // Required: text to speak aloud
  target?: string    // Optional: HomePod name (e.g., "Living Room")
}                    // Omit target to announce on all HomePods

The tool executor in homepod-tools.ts enqueues the job at priority 10 and returns immediately with a confirmation. The user sees "Announcing on HomePod..." in the chat interface while delivery happens asynchronously.

Database Schema

homepod_queue Table (Migration 041)

CREATE TABLE homepod_queue (
  id         SERIAL PRIMARY KEY,
  message    TEXT NOT NULL,
  target     TEXT,             -- HomePod name; NULL = all
  status     TEXT NOT NULL DEFAULT 'pending'
             CHECK (status IN (
               'pending', 'claimed',
               'sent', 'failed'
             )),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  claimed_at TIMESTAMPTZ,
  sent_at    TIMESTAMPTZ,
  error      TEXT
);

Status Lifecycle

Status	Set By	Meaning
`pending`	speak-homepod handler	Waiting for relay to claim
`claimed`	GET /api/homepod/pending	Relay has picked up the row; claimed_at timestamp set
`sent`	POST /api/homepod/[id]/complete	Audio successfully streamed to HomePod
`failed`	POST /api/homepod/[id]/complete	Delivery failed; error column populated

Stale Claim Recovery

The pending endpoint automatically resets claimed rows older than 2 minutes back to pending on each poll. This handles relay crashes, network interruptions, or pyatv timeouts without manual intervention.

API Endpoints

GET /api/homepod/pending

Authenticated via RELAY_AUTH_TOKEN Bearer header (shared with iMessage relay).

Resets stale claims, then returns up to 10 pending rows using FOR UPDATE SKIP LOCKED for safe concurrent access. Sets status to claimed and records claimed_at timestamp.

// Response
{
  "announcements": [{
    "id": 42,
    "message": "Dinner is ready",
    "target": "Kitchen",
    "created_at": "2026-03-22T..."
  }]
}

POST /api/homepod/[id]/complete

Called by relay after each announcement attempt. Accepts success boolean and optional error string.

Returns 404 if the row does not exist or is not in claimed state, preventing duplicate completion reports.

// Request body
{
  "success": true,
  "error"?: "pyatv timeout"
}

// Sets status = 'sent' | 'failed'
// Sets sent_at = NOW()
// Stores error text if failed

Tempo Worker Handler

speak-homepod.ts

The Tempo handler is deliberately minimal. Its only job is to bridge the Tempo job system to the relay queue, keeping the hot path fast:

async function handleSpeakHomepod(pool, payload) {
  const { message, target } = payload;

  // Insert into relay queue
  const result = await pool.query(
    `INSERT INTO homepod_queue (message, target)
     VALUES ($1, $2) RETURNING id`,
    [message, target || null]
  );

  // Log to event_log for observability
  await logEvent(pool, {
    component: 'homepod',
    category: 'interaction',
    action: `HomePod announce queued`
      + (target ? ` to ${target}` : '')
      + `: "${message.slice(0, 80)}"`
  });

  return { queued: true, queue_id: result.rows[0].id };
}

Message text is truncated to 80 characters in the event log to keep observability data manageable. The full message is preserved in the homepod_queue row.

Mac Relay Implementation

Local Execution: macOS say + pyatv

The Mac relay daemon runs on a Mac that shares the local network with HomePod speakers. It polls the ARIA API for pending announcements and executes them locally using two system-level tools:

Step 1: Text-to-Speech Generation

# macOS built-in TTS (Siri voice)
say -o /tmp/aria-speak-<uuid>.aiff "Dinner is ready"

The say command uses the system's default Siri voice to generate an AIFF audio file. The UUID-based filename prevents collisions when multiple announcements are queued.

Step 2: AirPlay Streaming via pyatv

# Stream to specific HomePod
atvremote -n "Kitchen" stream_file=/tmp/aria-speak-<uuid>.aiff

# Stream to first discovered HomePod (no target)
atvremote stream_file=/tmp/aria-speak-<uuid>.aiff

pyatv discovers HomePods via mDNS/Bonjour and streams audio using RAOP (AirPlay's audio protocol). No pairing is required for RAOP streaming to HomePods.

Prerequisites

pip3 install pyatv atvremote scan macOS 13+ Same network as HomePods

Design Decisions

Why Queue-Based Relay?

HomePods are not addressable from the internet. They only accept connections from devices on the same local network via mDNS discovery. ARIA's cloud services cannot reach them directly.

The relay pattern solves this by inverting the connection: the Mac relay pulls work from the cloud API rather than the cloud pushing to local hardware. This avoids NAT traversal, dynamic DNS, or VPN tunneling.

Why pyatv over Shortcuts?

The original implementation used Apple Shortcuts' Intercom/Announce action. This was abandoned because the Intercom action requires iOS 16.4+ and is not available in macOS Shortcuts.

pyatv provides a stable, open-source AirPlay implementation that works on macOS without any Apple framework dependencies. RAOP streaming is marked as "early stage" in pyatv but has proven reliable for speech-length audio.

Security & Authentication

Authentication Model

The HomePod API endpoints share the RELAY_AUTH_TOKEN Bearer token with the iMessage relay system. This is a static token stored in the container environment and configured on the Mac relay daemon.

Layer	Mechanism	Scope
API Authentication	Bearer token (`RELAY_AUTH_TOKEN`)	Relay ↔ ARIA API
Database Locking	`FOR UPDATE SKIP LOCKED`	Prevents duplicate claims
Stale Recovery	2-minute claim timeout	Auto-recovery from relay failures
Completion Guard	404 on non-claimed rows	Prevents duplicate completion reports
Network	RAOP (no pairing needed)	Mac relay ↔ HomePod (LAN only)

The relay token is shared with iMessage because both relay systems run on the same Mac daemon and poll the same ARIA API. A dedicated HomePod token could be added if the relay is split to a separate host.

Observability

Monitoring & Debugging

The HomePod system is observable at three levels:

Tempo Job Log — Every SPEAK_HOMEPOD job is tracked in tempo_jobs with status, duration, and error fields. Failed jobs appear in the ARIA Tempo dashboard.

Event Log — The handler writes to event_log with component 'homepod' and category 'interaction', enabling filtering in the analytics dashboard.

Queue Table — Direct queries against homepod_queue show delivery status, latency (created_at to sent_at), failure rates, and error messages.

-- Delivery latency for last 24 hours
SELECT
  AVG(EXTRACT(EPOCH FROM sent_at - created_at)) AS avg_seconds,
  COUNT(*) FILTER (WHERE status = 'sent') AS delivered,
  COUNT(*) FILTER (WHERE status = 'failed') AS failed
FROM homepod_queue
WHERE created_at > NOW() - INTERVAL '24 hours';

Current Limitations

Phase 1 Constraints

Limitation	Impact	Planned Resolution
Siri voice only	No custom ARIA voice identity	Phase 2: ElevenLabs TTS integration
~7-10s total latency	5s poll interval + TTS + AirPlay buffer	WebSocket relay or shorter poll interval
Text only, no SSML	No prosody control (emphasis, pauses)	Phase 2: SSML via ElevenLabs
One-way only	HomePod cannot respond back to ARIA	Phase 4: Conversational mode
Requires Mac relay	Always-on Mac needed on same LAN	Under investigation: HomePod direct API
pyatv RAOP "early stage"	Occasional streaming failures	Stale claim recovery handles retries

Roadmap

Phase 1: pyatv + macOS say

Active

Current implementation. Queue-based relay with macOS native TTS and pyatv RAOP streaming. Functional for basic announcements with Siri voice.

Phase 2: ElevenLabs Voice + AirPlay Routing

Planned

Replace macOS say with ElevenLabs TTS API to give ARIA a custom voice identity. Generated MP3 is routed to HomePod via one of three approaches:

Approach	Complexity	Notes
`SwitchAudioSource` CLI	Low	Simplest; switches Mac audio output to HomePod AirPlay, plays with `afplay`
CoreAudio API	Medium	`AudioObjectSetPropertyData` for programmatic output device selection
AVFoundation AirPlay	High	Direct AirPlay routing without switching system audio

Estimated cost: ~$0.03/announcement at 100 characters. Migration adds a voice TEXT DEFAULT 'siri' column to homepod_queue for backward compatibility.

Phase 3: Apple TV Video Announcements

Planned

Extend to visual announcements on Apple TV using pyatv's play_url mechanism. AI-generated avatar video synced with ElevenLabs speech output.

HeyGen API SadTalker + ElevenLabs D-ID Kling 3.0 ElevenLabs Image & Video

Target output: 1080p MP4 H.264. Same relay pattern, different media type and delivery endpoint.

Phase 4: Conversational & Multi-Room

Future

Full two-way conversational mode, multi-room targeting with room-aware context, priority levels for announcements, music integration (pause/resume around announcements), and a wake word trigger ("Hey ARIA") via always-on microphone.

ARIA HomePod Voice System • Technical Documentation • March 2026
Generated from source code analysis across aria, aria-tempo, aria-tempo-client, and aria-ios repositories