AI Transcription

Transcribe any recording in minutes

Drop in MP3, MP4, WebM, M4A — up to 100MB. Get timestamped transcripts with speaker labels, automatic punctuation, and 99% accuracy.

Built for accuracy and speed

UtterNote uses state-of-the-art speech models tuned for noisy, real-world audio — Zoom calls, on-the-go voice memos, conference recordings. Most uploads return in under five minutes.

Speaker labels

Automatic speaker diarization across two or more voices.

Multi-language

Detects and transcribes 50+ languages out of the box.

Timestamps

Every paragraph timestamped so you can jump back to source audio.

Fast turnaround

Most files complete in under 5 minutes; long-form async-safe.

What you can transcribe

  • Audio: MP3, WAV, M4A, OGG, audio-only MP4
  • Video: MP4, WebM, MOV (audio track extracted)
  • Recordings: Zoom, Google Meet, Microsoft Teams, Loom, voice memos
  • Up to 100MB per file, longer files via chunked upload

Frequently asked questions

What audio formats do you support?
MP3, MP4, WAV, M4A, WebM, MOV, OGG. Anything ffmpeg can decode.
How accurate is the transcription?
~99% on clean audio (single speaker, no music). Drops to ~95% on noisy multi-speaker calls. Speaker labels add another layer of precision.
Can I edit the transcript?
Yes — the transcript opens in an editable view where you can fix names, terms, or any model mistakes before generating a guide.
Is my audio private?
Yes. Files are stored in private Vercel Blob, processed once, and never used to train models. Delete anytime.

Related

Capture once. Document forever.

UtterNote turns audio and video into the SOPs your team actually follows.