Speak It - Voice-to-text with privacy-first style learning

Speak It

Voice-to-text Chrome extension that learns your writing style without storing your words.

What It Does

Speak naturally into any text field on the web. The extension transcribes your speech in real-time and formats it based on where you're typing. Emails get proper greetings. Slack stays casual. Twitter keeps it short.

The key difference: instead of storing your messages to learn your style, it stores statistics - sentence length, formality scores, phrase frequencies. A style fingerprint, not a transcript.

Features

Voice-to-Text Anywhere

  • Works on any website with text inputs
  • Keyboard shortcut: Alt+Shift+S to toggle recording
  • Real-time transcription with live preview
  • Platform-aware formatting (Gmail, Slack, Twitter, Notion, Google Docs, etc.)

Style Learning

  • Learns how you write without storing what you write
  • Extracts statistics: average sentence length, formality score, contractions usage
  • Stores common greetings, sign-offs, and phrases
  • Style profile improves over time (diminishing influence prevents drift)

Meeting Mode

  • Captures tab audio from Google Meet, Zoom, or any browser-based meeting
  • Mixes in your microphone for full conversation capture
  • Speaker diarization (labels who said what)
  • Live transcript in side panel
  • Auto-saves meeting notes to web app

Context Detection

  • Detects platform automatically (no manual switching)
  • Adjusts formality, capitalization, and structure per platform
  • Preserves your words while cleaning up grammar and filler words
Stack
Component Technology
Extension Chrome Manifest V3
Speech-to-Text Deepgram Nova-2 (WebSocket streaming)
AI Formatting Claude API (Anthropic)
Database TiDB (style profiles, meeting storage)
Audio Processing Web Audio API, ScriptProcessor
Text-to-Speech ElevenLabs (practice mode)
Frontend Next.js, Tailwind CSS
Hosting Vercel
Architecture
┌─────────────────────────────────────────────────────────────┐ │ Chrome Extension │ ├─────────────────┬─────────────────┬─────────────────────────┤ │ Content Script │ Background SW │ Offscreen Document │ │ - UI overlay │ - Message relay │ - Tab audio capture │ │ - Text insert │ - API calls │ - Mic mixing │ │ - Mic capture │ - State mgmt │ - Deepgram streaming │ │ - Platform │ - Keyboard │ - PCM conversion │ │ detection │ shortcuts │ │ └────────┬────────┴────────┬────────┴────────────┬────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Backend API │ ├─────────────────┬─────────────────┬─────────────────────────┤ │ /api/voice/ │ /api/style/ │ /api/meetings │ │ - token │ - format │ - GET (list) │ │ (Deepgram key) │ - learn │ - POST (save) │ │ │ - profile │ │ └────────┬────────┴────────┬────────┴────────────┬────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ Deepgram │ │ Claude API │ │ TiDB │ │ (transcription)│ │ (formatting) │ │ (profiles, meetings)│ └─────────────────┘ └─────────────────┘ └─────────────────────┘
How Style Learning Works

What Gets Stored

CREATE TABLE user_style_profiles (
  user_id VARCHAR(255) PRIMARY KEY,
  avg_sentence_length FLOAT,
  formality_score FLOAT,        -- 0 (casual) to 1 (formal)
  uses_contractions BOOLEAN,
  emoji_frequency FLOAT,
  top_phrases JSON,             -- ["sounds good", "let me know", ...]
  greetings JSON,               -- ["Hey", "Hi", ...]
  signoffs JSON,                -- ["Thanks", "Cheers", ...]
  sample_count INT,
  updated_at TIMESTAMP
);

What Never Gets Stored

  • Message content
  • Transcripts
  • Audio recordings
  • Conversation history

How It Updates

Each time you accept a formatted message:

  1. Statistics are extracted from the approved text
  2. New stats are blended with existing profile
  3. Influence decreases as sample count increases (1st message = 100%, 100th message = ~1%)
  4. Profile stabilizes over time

The privacy angle: instead of storing your messages, it stores statistics. Enough to personalize formatting without keeping what you actually said. A style fingerprint, not a transcript.

Audio Processing

Voice-to-Text Mode

// Capture mic at 16kHz
const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, echoCancellation: true, noiseSuppression: true }
});

// Convert Float32 to Int16 PCM for Deepgram
processor.onaudioprocess = (event) => {
  const input = event.inputBuffer.getChannelData(0);
  const pcm = new Int16Array(input.length);
  for (let i = 0; i < input.length; i++) {
    const sample = Math.max(-1, Math.min(1, input[i]));
    pcm[i] = sample < 0 ? sample * 0x8000 : sample * 0x7fff;
  }
  socket.send(pcm.buffer);
};

Meeting Mode Audio Mixing

// Tab audio (others' voices) + Mic audio (your voice)
const mixLength = Math.min(tabPcm.length, micChunk.length);
for (let i = 0; i < mixLength; i++) {
  // Mic slightly louder since it's your voice
  const mixed = (tabPcm[i] * 0.6) + (micChunk[i] * 0.8);
  finalPcm[i] = Math.max(-32768, Math.min(32767, Math.round(mixed)));
}

Why Mic Capture is in Content Script

Chrome offscreen documents can't show permission prompts (they're invisible). Mic capture silently fails. The fix: capture mic from the content script (runs on visible page), stream chunks to offscreen document for mixing with tab audio.

Platform-Specific Text Insertion

Every platform implements text input differently:

Platform Method Notes
Gmail execCommand('insertText') Works directly
Twitter/X execCommand('insertText') Must clear field first (Lexical editor)
Slack execCommand('insertText') Quill editor, mostly works
Notion Clipboard + paste prompt execCommand causes vertical text
Google Docs Hidden iframe + input events Canvas-based, not contentEditable
// Notion fallback
navigator.clipboard.writeText(text);
showMessage('Press Cmd+V to paste');
Deepgram Configuration
const url = new URL('wss://api.deepgram.com/v1/listen');
url.searchParams.set('model', 'nova-2');
url.searchParams.set('language', 'en');
url.searchParams.set('smart_format', 'true');
url.searchParams.set('punctuate', 'true');
url.searchParams.set('diarize', 'true');        // Speaker labels
url.searchParams.set('utterances', 'true');
url.searchParams.set('interim_results', 'true'); // Live preview
url.searchParams.set('encoding', 'linear16');
url.searchParams.set('sample_rate', '16000');
url.searchParams.set('channels', '1');
Formatting Prompt
function buildStylePrompt(profile, context) {
  const formality = profile.formality_score > 0.7 ? "formal" :
                    profile.formality_score < 0.3 ? "casual" : "balanced";

  return `Format this transcript for ${context}.

User's writing style:
- Tone: ${formality}
- Average sentence length: ~${Math.round(profile.avg_sentence_length)} words
- ${profile.uses_contractions ? "Use contractions naturally." : "Avoid contractions."}
- Preferred greetings: ${profile.greetings.slice(0, 3).join(", ")}
- Preferred sign-offs: ${profile.signoffs.slice(0, 3).join(", ")}

Rules:
1. ONLY add punctuation and paragraph breaks
2. Remove filler words: um, uh, like, basically, you know
3. Keep EVERY other word exactly as spoken
4. Do NOT rewrite, rephrase, or improve their language`;
}
File Structure
speak-it/
├── manifest.json           # Extension config (Manifest V3)
├── background.js           # Service worker, message routing
├── content.js              # UI overlay, text insertion, mic capture
├── content.css             # Overlay styles
├── offscreen.html          # Audio processing document
├── offscreen.js            # Tab capture, Deepgram streaming, mixing
├── popup.html              # Extension popup UI
├── popup.js                # Popup logic
├── meeting-panel.html      # Side panel for Meeting Mode
├── meeting-panel.js        # Meeting UI logic
├── meeting-panel.css       # Meeting panel styles
└── icons/                  # Extension icons
API Endpoints
Endpoint Method Description
/api/voice/token POST Get Deepgram API key for client
/api/style/format POST Format transcript with style + context
/api/style/learn POST Update style profile from accepted text
/api/style/profile GET Fetch user's style profile
/api/meetings GET List all saved meetings
/api/meetings POST Save meeting transcripts
Privacy
  • No message storage: Your words are processed and discarded
  • Statistics only: Style profiles contain patterns, not content
  • Local-first: Extension works offline for basic transcription
  • You control your data: Delete your profile anytime
Known Limitations
  • Notion requires manual paste (Cmd+V)
  • Google Docs text insertion is slow (character-by-character)
  • Meeting Mode requires mic permission on each new domain
  • Deepgram free tier has usage limits
Code Access

The repo is private right now.

If you're a recruiter or hiring manager and want to walk through the code or architecture in more detail, I'm happy to do a live session.