AI Transcription
Transcribe any recording in minutes
Drop in MP3, MP4, WebM, M4A — up to 100MB. Get timestamped transcripts with speaker labels, automatic punctuation, and 99% accuracy.
Built for accuracy and speed
UtterNote uses state-of-the-art speech models tuned for noisy, real-world audio — Zoom calls, on-the-go voice memos, conference recordings. Most uploads return in under five minutes.
Speaker labels
Automatic speaker diarization across two or more voices.
Multi-language
Detects and transcribes 50+ languages out of the box.
Timestamps
Every paragraph timestamped so you can jump back to source audio.
Fast turnaround
Most files complete in under 5 minutes; long-form async-safe.
What you can transcribe
- Audio: MP3, WAV, M4A, OGG, audio-only MP4
- Video: MP4, WebM, MOV (audio track extracted)
- Recordings: Zoom, Google Meet, Microsoft Teams, Loom, voice memos
- Up to 100MB per file, longer files via chunked upload
Frequently asked questions
What audio formats do you support?
MP3, MP4, WAV, M4A, WebM, MOV, OGG. Anything ffmpeg can decode.
How accurate is the transcription?
~99% on clean audio (single speaker, no music). Drops to ~95% on noisy multi-speaker calls. Speaker labels add another layer of precision.
Can I edit the transcript?
Yes — the transcript opens in an editable view where you can fix names, terms, or any model mistakes before generating a guide.
Is my audio private?
Yes. Files are stored in private Vercel Blob, processed once, and never used to train models. Delete anytime.