Speak2Me

Voice-first AI journal that actually LISTENS, remembers everything you've ever told it, and talks back like someone who knows you.

What It Does

Open the app, tap the mic, and just... talk. About your day, your stress, your wins, whatever. The AI listens to your voice, picks up on how you're ACTUALLY feeling from your tone, and responds like a close friend who's been there for every conversation you've ever had.

Here's the thing that makes this different from every other AI chat out there: most of them start fresh every session. They don't know you. Speak2Me has a three-tier memory system that pre-loads your entire history before you even click start. It knows your partner's name, your goals, your recurring stress patterns, and that interview you mentioned last Tuesday. Zero recall delay. First message, full context.

Features

Voice Conversation

Real-time voice chat powered by Hume EVI (Empathic Voice Interface). It doesn't just read your words, it listens to HOW you say them. If your voice sounds sad but you say "I'm fine," it notices. There's also a text input fallback for when you can't talk out loud, plus mute/unmute and echo cancellation so it actually works in real environments.

Persistent Memory

This is the core of the whole thing. Three tiers working together:

Tier 1: Profile Summary. A structured summary of everything the AI knows about you. Auto-generated after each conversation by synthesizing all your facts into labeled sections.

IDENTITY: Alex, 34, Austin TX
FAMILY: Partner Jamie, Son Lucas (born Mar 2025)
WORK: Software engineer, side projects in AI
FINANCES: Maxing out Roth IRA, saving for house down payment
HEALTH: Back pain from desk setup, started physical therapy
CURRENT: Interviewing at two companies this month
PATTERNS: Anxiety spikes before interviews, cooking is a stress reliever

Tier 2: Quick Facts. Individual facts extracted from every single conversation. Stored in a dedicated table with categories (identity, family, work, finance, health, interest, project, social). Identity facts like names and birthdays are pinned and NEVER drop off.

CREATE TABLE s2m_user_facts (
  id VARCHAR(36) PRIMARY KEY,
  user_id VARCHAR(255) NOT NULL,
  fact_text TEXT NOT NULL,
  category VARCHAR(20) NOT NULL,
  source_entry_id VARCHAR(36),
  is_active BOOLEAN DEFAULT TRUE,
  superseded_by VARCHAR(36) NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Tier 3: Vector Search. Every conversation is chunked, embedded with OpenAI text-embedding-3-large, and stored in TiDB's vector column. Ask "what was I stressed about in January?" and it does a cosine similarity search across all your past conversations. Real answers from your own words.

Emotion Tracking

Hume's prosody model detects emotions from your voice in real time. Not sentiment analysis on text. Actual vocal patterns. Emotions are stored per conversation, aggregated into weekly trends on the dashboard. The AI responds to emotional signals naturally without narrating them like a robot.

Conversation History

Full transcripts saved with auto-save every 5 seconds. Browse by date with lazy loading. Ask natural language questions about your journal history through conversation insights. "On This Day" throwbacks from past entries. Copy and download any transcript.

Dashboard

Streak tracking, weekly emotion trends, smart memory carousel with upcoming date reminders, and milestone progress cards. Everything you need to see your patterns over time.

Dynamic Timezone

Browser detects your timezone automatically on page load and stores it in your profile. Fly to Seoul, and the AI says "good morning" at Seoul time. No config needed.

Stack

Component Technology
VoiceHume EVI (WebSocket streaming, emotion detection)
LLMClaude Sonnet 4.6 (via Custom Language Model endpoint)
DatabaseTiDB Serverless (vector search + relational data)
EmbeddingsOpenAI text-embedding-3-large
Background JobsInngest (profile synthesis, cache rebuilding, transcript processing)
AuthNextAuth.js (Google OAuth)
FrontendNext.js 16, React, Tailwind CSS
HostingVercel

Architecture

Voice Session

  • Mic capture
  • Hume SDK
  • Live transcript
  • Auto-save
  • Text fallback

Dashboard

  • Streak tracking
  • Emotion trends
  • Memory carousel
  • Milestones
  • On This Day

History

  • Browse by date
  • Transcript detail
  • Conversation insights

Hume EVI

  • WebSocket voice streaming
  • Prosody emotion detection
  • Turn detection
  • Calls CLM endpoint for every message

/api/hume/clm

  • Build system prompt w/ full context
  • Stream Claude response

/api/journal

  • Save session
  • Extract facts
  • Auto-save
  • Embed chunks

/api/memory

  • Cached profile
  • Quick facts
  • Recent context

Claude API

  • Sonnet 4.6 (conversation)
  • Haiku (fact extraction)
  • Opus (insights)

Inngest

  • Profile synthesis
  • Cache rebuild
  • Transcript processing

TiDB

  • User profiles
  • Journal entries
  • Facts table
  • Transcript chunks (vectors)
  • Voice profiles

How Memory Works

Most AI memory works like this: user says something, AI calls a memory API, waits 5-10 seconds, then responds. It's slow and it FEELS slow.

Speak2Me does it differently. When you open the app, your profile, facts, and recent context are all pre-fetched in the background before you click start. By the time the session begins, everything is already loaded. The AI has your full context on the very first message. Vector search only fires when you ask about something deep in your history.

User opens app    → Profile pre-fetched (background)
User clicks Start → Data already cached (zero wait)
User speaks       → CLM has full context on first turn
User asks about last month → Vector search (only when needed)

The CLM (Custom Language Model) Pattern

Hume EVI handles voice streaming and emotion detection, but it doesn't know your life story. The CLM endpoint bridges that gap.

When you speak, Hume transcribes your audio and detects emotions from your vocal prosody. Then it calls our CLM endpoint with the transcript and emotion scores. The endpoint runs 6 parallel queries (profile, vector search, recent chunks, active facts, last conversation timestamp, timezone), builds a system prompt with all that context plus the emotion data, and streams Claude's response back to Hume. Hume converts Claude's text to speech and plays it back.

Every single turn goes through this loop. The AI rebuilds its full context on every message so it never goes stale mid-conversation.

Fact Extraction

After each conversation ends, facts are extracted synchronously in about 500ms using Claude Haiku. It reads the transcript and pulls out:

Facts are categorized, deduplicated against existing facts (70% word overlap threshold), and older versions get superseded by newer ones. A garbage filter rejects meta-observations like "user tested whether AI remembers" or "AI responded with." Because that's not a fact about YOU. That's noise.

Emotion Detection

Hume's prosody model analyzes vocal patterns and returns confidence scores for 48 emotions on every utterance. The top 3 are passed to Claude as context:

The user's voice shows: Sadness: 45%, Anxiety: 32%, Determination: 28%

The system prompt tells Claude to respond to the emotion without narrating it. If someone sounds sad but says they're fine, the AI might say "You say you're fine but I can hear it in your voice. What's really going on?" instead of "I detect that you're feeling sad." Because nobody talks like that.

API Endpoints

Endpoint Method Description
/api/hume/clm/chat/completionsPOSTCLM endpoint (called by Hume, not browser)
/api/hume/tokenGETGet Hume access token for WebSocket
/api/journalGET/POSTList entries / Save completed session
/api/journal/autosavePOSTLive transcript save (every 5s)
/api/journal/insightsPOSTAsk questions about your history
/api/journal/emotionsGETEmotion trend data
/api/memory/profileGETCached profile + facts + recent context
/api/user/timezonePOSTSync browser timezone to DB

File Structure

speak2me/
├── app/
│   ├── layout.tsx                    # Root layout with auth, timezone sync
│   ├── page.tsx                      # Dashboard (streaks, emotions, memory)
│   ├── journal/page.tsx              # Voice session page
│   ├── history/page.tsx              # Conversation history by date
│   ├── entry/[id]/page.tsx           # Individual entry detail
│   ├── settings/page.tsx             # User settings
│   └── api/
│       ├── hume/
│       │   ├── clm/chat/completions/ # Custom Language Model endpoint
│       │   └── token/                # Hume access token
│       ├── journal/
│       │   ├── route.ts              # Save/list entries + fact extraction
│       │   ├── autosave/             # Live transcript auto-save
│       │   ├── insights/             # Natural language history queries
│       │   ├── emotions/             # Emotion trend data
│       │   └── stats/                # Usage statistics
│       ├── memory/
│       │   └── profile/              # Cached profile + facts + context
│       └── user/
│           └── timezone/             # Browser timezone sync
├── components/
│   ├── voice/
│   │   └── voice-session.tsx         # Main voice UI
│   ├── dashboard/
│   │   ├── memory-callback.tsx       # Smart memory carousel
│   │   ├── emotion-trends.tsx        # Weekly mood chart
│   │   └── streak-display.tsx        # Streak counter
│   ├── timezone-sync.tsx             # Auto-syncs browser TZ to DB
│   └── ...
├── lib/
│   ├── prompts.ts                    # System prompt builder + fact selection
│   ├── memory.ts                     # Profile synthesis
│   ├── facts.ts                      # Fact storage, retrieval, dedup
│   ├── embeddings.ts                 # OpenAI embedding generation
│   ├── db.ts                         # TiDB connection
│   └── inngest/
│       └── functions.ts              # Background jobs (profile, cache, chunks)
└── scripts/
    ├── schema.sql                    # Full database schema
    ├── migrate-facts.ts              # Fact migration tool
    └── rebuild-cache.ts              # Cache rebuilder

Privacy

All data is per-user and authenticated via Google OAuth. Transcripts and AI responses are stored in TiDB (encryption planned). Emotion data is stored alongside transcripts and never shared. There are no third-party memory APIs involved. Voice audio is processed by Hume in real time and not stored.

Your data stays yours. That was the whole point of building this.

Known Limitations

Code Access

The repo is private. If you're a recruiter or hiring manager and want to walk through the code or architecture, I'm happy to do a live session. Just reach out.

Try it: speak2me.io

Chris Dabatos
Chris Dabatos
Developer Advocate building AI-powered apps