Speak It

Voice-to-text Chrome extension that learns your writing style without storing your words.

What It Does

Speak naturally into any text field on the web. The extension transcribes your speech in real-time and formats it based on where you're typing. Emails get proper greetings. Slack stays casual. Twitter keeps it short.

The key difference: instead of storing your messages to learn your style, it stores statistics - sentence length, formality scores, phrase frequencies. A style fingerprint, not a transcript.

Features

Voice-to-Text Anywhere

Works on any website with text inputs
Keyboard shortcut: Alt+Shift+S to toggle recording
Real-time transcription with live preview
Platform-aware formatting (Gmail, Slack, Twitter, etc.)
Notion and Google Docs support in alpha
Currently Chrome-only (other browsers untested)

Style Learning

Learns how you write without storing what you write
Extracts statistics: average sentence length, formality score, contractions usage
Stores common greetings, sign-offs, and phrases
Style profile improves over time (diminishing influence prevents drift)

Meeting Mode

Captures tab audio from Google Meet, Zoom, or any browser-based meeting
Mixes in your microphone for full conversation capture
Speaker diarization (labels who said what)
Live transcript in side panel
Auto-saves meeting notes to web app

Context Detection

Detects platform automatically (no manual switching)
Adjusts formality, capitalization, and structure per platform
Preserves your words while cleaning up grammar and filler words

Stack

Component	Technology
Extension	Chrome Manifest V3
Speech-to-Text	Deepgram Nova-2 (WebSocket streaming)
AI Formatting	Claude API (Anthropic)
Database	TiDB (style profiles, meeting storage)
Audio Processing	Web Audio API, ScriptProcessor
Text-to-Speech	ElevenLabs (practice mode)
Frontend	Next.js, Tailwind CSS
Hosting	Vercel

Architecture

┌─────────────────────────────────────────────────────────────┐ │ Chrome Extension │ ├─────────────────┬─────────────────┬─────────────────────────┤ │ Content Script │ Background SW │ Offscreen Document │ │ - UI overlay │ - Message relay │ - Tab audio capture │ │ - Text insert │ - API calls │ - Mic mixing │ │ - Mic capture │ - State mgmt │ - Deepgram streaming │ │ - Platform │ - Keyboard │ - PCM conversion │ │ detection │ shortcuts │ │ └────────┬────────┴────────┬────────┴────────────┬────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Backend API │ ├─────────────────┬─────────────────┬─────────────────────────┤ │ /api/voice/ │ /api/style/ │ /api/meetings │ │ - token │ - format │ - GET (list) │ │ (Deepgram key) │ - learn │ - POST (save) │ │ │ - profile │ │ └────────┬────────┴────────┬────────┴────────────┬────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ Deepgram │ │ Claude API │ │ TiDB │ │ (transcription)│ │ (formatting) │ │ (profiles, meetings)│ └─────────────────┘ └─────────────────┘ └─────────────────────┘

How Style Learning Works

What Gets Stored

CREATE TABLE user_style_profiles (
  user_id VARCHAR(255) PRIMARY KEY,
  avg_sentence_length FLOAT,
  formality_score FLOAT,        -- 0 (casual) to 1 (formal)
  uses_contractions BOOLEAN,
  emoji_frequency FLOAT,
  top_phrases JSON,             -- ["sounds good", "let me know", ...]
  greetings JSON,               -- ["Hey", "Hi", ...]
  signoffs JSON,                -- ["Thanks", "Cheers", ...]
  sample_count INT,
  updated_at TIMESTAMP
);

What Never Gets Stored

Message content
Transcripts
Audio recordings
Conversation history

How It Updates

Each time you accept a formatted message:

Statistics are extracted from the approved text
New stats are blended with existing profile
Influence decreases as sample count increases (1st message = 100%, 100th message = ~1%)
Profile stabilizes over time

The privacy angle: instead of storing your messages, it stores statistics. Enough to personalize formatting without keeping what you actually said. A style fingerprint, not a transcript.

Audio Processing

Voice-to-Text Mode

// Capture mic at 16kHz
const stream = await navigator.mediaDevices.getUserMedia({
  audio: { sampleRate: 16000, echoCancellation: true, noiseSuppression: true }
});

// Convert Float32 to Int16 PCM for Deepgram
processor.onaudioprocess = (event) => {
  const input = event.inputBuffer.getChannelData(0);
  const pcm = new Int16Array(input.length);
  for (let i = 0; i < input.length; i++) {
    const sample = Math.max(-1, Math.min(1, input[i]));
    pcm[i] = sample < 0 ? sample * 0x8000 : sample * 0x7fff;
  }
  socket.send(pcm.buffer);
};

Meeting Mode Audio Mixing

// Tab audio (others' voices) + Mic audio (your voice)
const mixLength = Math.min(tabPcm.length, micChunk.length);
for (let i = 0; i < mixLength; i++) {
  // Mic slightly louder since it's your voice
  const mixed = (tabPcm[i] * 0.6) + (micChunk[i] * 0.8);
  finalPcm[i] = Math.max(-32768, Math.min(32767, Math.round(mixed)));
}

Why Mic Capture is in Content Script

Chrome offscreen documents can't show permission prompts (they're invisible). Mic capture silently fails. The fix: capture mic from the content script (runs on visible page), stream chunks to offscreen document for mixing with tab audio.

Platform-Specific Text Insertion

Every platform implements text input differently:

Platform	Method	Notes
Gmail	`execCommand('insertText')`	Works directly
Twitter/X	`execCommand('insertText')`	Must clear field first (Lexical editor)
Slack	`execCommand('insertText')`	Quill editor, mostly works
Notion (alpha)	Clipboard + paste prompt	execCommand causes vertical text
Google Docs (alpha)	Hidden iframe + input events	Canvas-based, not contentEditable

// Notion fallback
navigator.clipboard.writeText(text);
showMessage('Press Cmd+V to paste');

Deepgram Configuration

const url = new URL('wss://api.deepgram.com/v1/listen');
url.searchParams.set('model', 'nova-2');
url.searchParams.set('language', 'en');
url.searchParams.set('smart_format', 'true');
url.searchParams.set('punctuate', 'true');
url.searchParams.set('diarize', 'true');        // Speaker labels
url.searchParams.set('utterances', 'true');
url.searchParams.set('interim_results', 'true'); // Live preview
url.searchParams.set('encoding', 'linear16');
url.searchParams.set('sample_rate', '16000');
url.searchParams.set('channels', '1');

Formatting Prompt

function buildStylePrompt(profile, context) {
  const formality = profile.formality_score > 0.7 ? "formal" :
                    profile.formality_score < 0.3 ? "casual" : "balanced";

  return `Format this transcript for ${context}.

User's writing style:
- Tone: ${formality}
- Average sentence length: ~${Math.round(profile.avg_sentence_length)} words
- ${profile.uses_contractions ? "Use contractions naturally." : "Avoid contractions."}
- Preferred greetings: ${profile.greetings.slice(0, 3).join(", ")}
- Preferred sign-offs: ${profile.signoffs.slice(0, 3).join(", ")}

Rules:
1. ONLY add punctuation and paragraph breaks
2. Remove filler words: um, uh, like, basically, you know
3. Keep EVERY other word exactly as spoken
4. Do NOT rewrite, rephrase, or improve their language`;
}

File Structure

speak-it/
├── manifest.json           # Extension config (Manifest V3)
├── background.js           # Service worker, message routing
├── content.js              # UI overlay, text insertion, mic capture
├── content.css             # Overlay styles
├── offscreen.html          # Audio processing document
├── offscreen.js            # Tab capture, Deepgram streaming, mixing
├── popup.html              # Extension popup UI
├── popup.js                # Popup logic
├── meeting-panel.html      # Side panel for Meeting Mode
├── meeting-panel.js        # Meeting UI logic
├── meeting-panel.css       # Meeting panel styles
└── icons/                  # Extension icons

API Endpoints

Endpoint	Method	Description
`/api/voice/token`	POST	Get Deepgram API key for client
`/api/style/format`	POST	Format transcript with style + context
`/api/style/learn`	POST	Update style profile from accepted text
`/api/style/profile`	GET	Fetch user's style profile
`/api/meetings`	GET	List all saved meetings
`/api/meetings`	POST	Save meeting transcripts

Privacy

No message storage: Your words are processed and discarded
Statistics only: Style profiles contain patterns, not content
Local-first: Extension works offline for basic transcription
You control your data: Delete your profile anytime

Known Limitations

Notion requires manual paste (Cmd+V)
Google Docs text insertion is slow (character-by-character)
Meeting Mode requires mic permission on each new domain
Deepgram free tier has usage limits

Code Access

The repo is private right now.

If you're a recruiter or hiring manager and want to walk through the code or architecture in more detail, I'm happy to do a live session.

Speak It

Voice-to-Text Anywhere

Style Learning

Meeting Mode

Context Detection

What Gets Stored

What Never Gets Stored

How It Updates

Voice-to-Text Mode

Meeting Mode Audio Mixing

Why Mic Capture is in Content Script

Let's build something awesome.