How to transcribe Discord calls is a question that comes up constantly in gaming communities, online teams, podcast crews, and moderation staff — and the answer is not obvious because Discord gives you no built-in way to do it. This post walks through exactly how to get a clean, accurate transcript of any Discord call using free tools, explains the realistic tradeoffs between local and cloud methods, and shows you a step-by-step local Whisper workflow that keeps your audio off third-party servers entirely.
TL;DR
- Discord has no native transcription — you must record the call first, then transcribe the audio file
- The best free local option is OpenAI Whisper, which runs entirely on your own PC
- Record with OBS Studio (desktop audio capture) or the Craig bot (per-speaker tracks)
- Transcribe with
whisper audio.mp3 --model smallfrom the command line, or use a desktop app - For multi-speaker labeling, pair Whisper with pyannote.audio or use a cloud service
- Always tell participants you are recording — consent requirements vary by country and US state
Why People Transcribe Discord Voice Chat
Discord started as a gaming chat app but has grown into an infrastructure layer for indie teams, online communities, content creators, and remote-first projects. As a result, calls happening over Discord voice channels are not always casual — they are standup meetings, podcast recordings, guild strategy sessions, moderation hearings, and client calls.
Here are the main reasons people want discord call transcription:
Meeting notes and accountability. A lot of community-run servers make decisions verbally over voice. A transcript gives every member a searchable record without relying on someone’s memory or a sloppy copy-paste from stream chat.
Accessibility. Deaf or hard-of-hearing members need text versions of voice conversations. Even for hearing users, transcripts let people catch up asynchronously without sitting through a full recording.
Content repurposing. Podcasters and streamers who record conversations on Discord want a rough transcript before editing — it speeds up finding timestamps, generating show notes, and pulling quotes for social media.
Moderation records. Server moderators sometimes need to document what was said during a conflict or harassment incident. A transcript is easier to review and share with an appeals process than an hour-long audio file.
Dictation and podcast show notes. Writers and solo creators use Discord calls as a dictation medium — talking through ideas and then feeding the recording through Whisper to get a first draft. Whisper’s accuracy on clear speech is close enough to make this genuinely useful.
Does Discord Have a Native Transcription Feature?
Discord does not have built-in call transcription as of 2026. The platform does offer live captions in voice channels — an accessibility feature that generates real-time subtitles as people speak — but those captions exist only during the session and are never saved. Once everyone leaves the channel, the captions are gone.
Discord’s live captions use a cloud-based speech recognition engine and do not produce a downloadable transcript. There is no transcript history, no export option, and no API that lets you pull caption data after the fact. If you need a permanent record of what was said, you have to handle the recording and transcription yourself.
How to Transcribe Discord Calls: The Core Workflow
The core answer to how to transcribe discord calls is a two-step process: record the audio, then run speech-to-text on the file.
Step 1 is necessary because Discord does not expose raw audio streams to third-party desktop tools in real time without a virtual audio device or a dedicated bot. Step 2 can be done locally (free, private) or with a cloud service (easier multi-speaker support, costs money or has usage limits).
Here is the full local workflow from start to finish.
Step 1: Record the Discord Call
You have three solid options depending on your situation:
OBS Studio (free, no bot required)
- Download and install OBS Studio if you do not already have it.
- In OBS, go to Settings → Output → Recording. Set the format to WAV or FLAC for best transcription accuracy (MP3 is fine too, just lower quality).
- In the Audio Mixer, make sure “Desktop Audio” is enabled. This captures everything coming out of your speakers/headphones, including Discord voice.
- Optionally add a Mic/Aux source to capture your own voice on a separate track — useful for transcription accuracy and multi-speaker diarization later.
- Start recording before the call begins. Stop it when everyone disconnects.
- Find the recording in the path you set (default: Videos folder).
Craig Bot (free tier available, per-speaker tracks)
Craig is a Discord bot purpose-built for recording. Invite it to your server, type /join in a voice channel, and it records every participant to a separate audio track. After the call, it emails you a download link with individual FLAC files per speaker. This makes diarization much easier — you already know which file belongs to which speaker.
Craig’s free tier covers most community recording needs. The per-speaker format is the biggest advantage over OBS for transcription of group calls.
VoxBooster’s Built-in Recording (Windows only)
VoxBooster includes an audio recording layer that captures processed audio — so if you are also running voice effects or noise suppression during the call, the recording reflects what the other side actually heard. The output is a clean WAV file ready for transcription. Because all processing is local, nothing is uploaded anywhere.
Step 2: Transcribe the Recording with Whisper
OpenAI Whisper is a free, open-source speech recognition model that runs entirely on your PC. No account, no API key, no usage limit. Read more about setting it up in our Whisper transcription on Windows guide.
Installing Whisper
You need Python 3.9–3.12 and ffmpeg on PATH. Install Whisper via pip:
pip install openai-whisper
Verify ffmpeg is accessible:
ffmpeg -version
If that errors, install ffmpeg via winget: winget install Gyan.FFmpeg
Running a Transcription
whisper discord_call.wav --model small --language en --output_format txt
--model smallis a good default: ~244 MB, fast, accurate on clean speech--language enskips language detection and speeds things up if you know the language--output_format txtgives a plain-text file; usesrtif you want subtitles with timestamps
For a one-hour recording on a modern CPU, the small model takes roughly 8–15 minutes. With an Nvidia GPU (CUDA), it drops to under 2 minutes.
Output location: Whisper saves the transcript in the same folder as the source file by default.
Transcription Methods Compared
| Method | Cost | Privacy | Accuracy | Multi-speaker | Setup effort |
|---|---|---|---|---|---|
| Local Whisper (CLI) | Free | Fully local | High (small/medium model) | No (words only) | Medium — needs Python + ffmpeg |
| Local Whisper + pyannote | Free | Fully local | High | Yes (speaker labels) | High — extra library, GPU helps |
| Craig bot + Whisper | Free | Bot has access to your audio | High | Yes (per-track files) | Low-medium |
| AssemblyAI / Deepgram | Pay-per-minute | Cloud upload | Very high | Yes (built-in) | Low — API key only |
| Otter.ai | Freemium | Cloud upload | Good | Yes | Very low — browser-based |
| Discord live captions | Free | Cloud (Discord) | Basic | No | None — built-in, not saved |
The right choice depends on your threat model. If you are transcribing sensitive moderation conversations or internal business calls, local Whisper keeps audio off third-party servers entirely. If you are a podcaster who just wants good show notes fast, a cloud service like AssemblyAI is less friction. For most gamers and community managers, the OBS + local Whisper combo hits the sweet spot.
Handling Multiple Speakers in Discord Audio Transcription
Whisper produces a single stream of text. It does not know that “Hey, I disagree with that” came from one person and “Let me finish” came from another. For simple two-person calls, this is manageable — you can read the transcript and figure out context. For calls with five or more speakers, unlabeled text becomes hard to use.
Option 1: Per-Speaker Files from Craig
If you recorded with Craig, you already have separate FLAC files per participant. Run Whisper on each file independently:
whisper alice.flac --model small --output_format txt
whisper bob.flac --model small --output_format txt
Then merge the timestamped outputs chronologically. The timestamps Whisper produces ([00:00 --> 00:15]) let you interleave them. This is manual but the most reliable approach.
Option 2: pyannote.audio for Diarization
pyannote.audio is an open-source speaker diarization library. Combined with Whisper, it produces output like:
[SPEAKER_00] 00:00:02 - 00:00:08: We should move the event to Saturday.
[SPEAKER_01] 00:00:09 - 00:00:14: I agree, Sunday is packed for half the server.
Setup is more involved (Hugging Face token for model weights, GPU strongly recommended), but the output is much more usable for meeting notes. Check the pyannote GitHub for current installation instructions since the API changes between versions.
Option 3: Cloud with Built-in Diarization
Services like AssemblyAI and Deepgram both offer speaker diarization as a one-click option in their APIs. You upload the file, specify diarization: true, and get back labeled JSON. The tradeoff is that your audio leaves your machine — factor that into your decision if the call content is sensitive.
Record and Transcribe Discord: Consent and Legal Considerations
Before you record and transcribe discord conversations, you need to think about consent. This is not just etiquette — it is a legal requirement in many places.
One-party vs. all-party consent states. In the US, federal law (ECPA) allows one-party consent — meaning you can record a call you are participating in without notifying others. But roughly twelve US states, including California, Illinois, and Florida, require all-party consent. Recording a call with a California resident without their knowledge could expose you to civil liability.
EU and GDPR. In the EU, recording someone’s voice constitutes processing personal data. You need a lawful basis — typically explicit consent. Inform participants and get a verbal acknowledgment at the start of the call.
Discord’s rules. Discord’s Community Guidelines and Terms of Service do not explicitly prohibit call recording by participants, but distributing recordings to harm or harass others violates the guidelines. If you are recording for moderation purposes, follow your server’s own rules and keep recordings secure.
Practical best practice: Announce it out loud at the start. “Hey, I’m recording this call for notes” is enough for consent in most contexts. For anything formal, get a text acknowledgment in the server chat.
Improving Transcription Accuracy for Discord Audio
Discord’s Opus codec compresses audio aggressively. Recordings from Discord voice channels tend to have more compression artifacts than a local microphone recording, which can hurt Whisper’s accuracy on quieter speakers or non-native accents.
A few things that help:
Noise suppression before recording. Running noise suppression during the call (built into Discord’s client or via a desktop app) produces cleaner source audio for transcription. VoxBooster’s local noise suppression, for example, processes audio in real time with no cloud dependency — and because processing happens on-device, you can record the clean output directly. See how voice features work on Discord.
Use a higher Whisper model for difficult audio. If the small model produces gibberish on a noisy recording, try medium or large-v3. The accuracy jump is significant on heavily compressed or accented speech.
Mono vs. stereo. Whisper performs better on mono recordings. If your OBS setup records stereo (left channel mic, right channel Discord), downmix to mono with ffmpeg before transcribing:
ffmpeg -i stereo_recording.wav -ac 1 mono_recording.wav
Specify the language. If everyone on the call speaks English, pass --language en to Whisper. Skipping language detection removes one potential failure point and speeds up the first pass.
Initial prompt. Whisper accepts an --initial_prompt argument that biases the model toward vocabulary it sees in the prompt. If your call is about a specific game or technical topic, priming the model with relevant terms can cut down on proper-noun errors:
whisper call.wav --initial_prompt "Valorant gameplay strategy, agent picks, site control"
Whisper Discord Transcription Without the Command Line
Not everyone wants to run Python commands. If you prefer a GUI, there are a few approaches:
VoxBooster bundles Whisper-grade local speech-to-text with a graphical interface. You can drop an audio file onto the transcription screen and get a text file without opening a terminal. All processing runs on your PC — no file leaves your machine. Download VoxBooster to try it, or see pricing options if you want the full feature set including real-time dictation during calls.
Whisper Desktop / Whisper Transcriber. Several open-source GUI wrappers around Whisper exist on GitHub. Quality varies and they are less actively maintained, but they work if you just need a point-and-click file transcription.
whisper.cpp with a GUI frontend. The whisper.cpp port is a C++ implementation that does not require Python. Some community frontends wrap it in a simple drag-and-drop interface. See our guide on Whisper dictation for Windows for more context on desktop Whisper setups.
Using Transcripts for Discord Meeting Notes
Once you have a raw transcript, the next challenge is turning it into something useful. Whisper output is a dense wall of text with timestamps but no formatting. Here is a quick cleanup workflow:
- Strip timestamps if you do not need them. A text editor with regex find-and-replace handles this fast: find
\[\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}\.\d{3}\]and replace with nothing. - Add speaker labels using the diarization approach described above, or manually if you know the call well.
- Run it through a summarizer. Paste the cleaned transcript into any LLM chat interface and ask it to produce bullet-point action items. This turns a messy hour-long call into a five-bullet summary in about 30 seconds.
- Post to your server. Paste the summary (not the raw transcript) into a dedicated
#meeting-noteschannel. Your members can search it, link to it, and hold people accountable to what was actually said.
Frequently Asked Questions
Does Discord have built-in transcription?
No. As of 2026, Discord has no native call transcription feature. Discord does offer live captions in voice channels as an accessibility option, but those captions are not saved anywhere — they disappear when the session ends. To get a permanent transcript, you need to record the call and transcribe the audio separately.
Is it legal to record and transcribe a Discord call?
It depends on your jurisdiction. Many US states require only one-party consent (you can record a call you are part of without telling the other side), but some states and most EU countries require all-party consent. Always inform participants before recording. Discord’s own Terms of Service do not forbid recording, but breaking local wiretapping law is your responsibility.
What is the most accurate free transcription for Discord audio?
OpenAI Whisper’s large-v3 model delivers word error rates under 5% on clean audio and is completely free to run locally. For Discord calls recorded with a decent headset in a quiet environment, the small or medium Whisper model is usually accurate enough and much faster than large-v3.
Can I transcribe Discord calls with multiple speakers?
Whisper alone does not do speaker diarization — it transcribes words but does not label who said them. To get speaker-labeled output you need to combine Whisper with a diarization tool like pyannote.audio, or use a cloud service like AssemblyAI that handles diarization natively. Local diarization works but requires more setup.
How do I record a Discord call on Windows?
The simplest method is OBS Studio set to capture desktop audio or a virtual audio cable. Route the Discord output to the recording source, start the session, and export the recording as a WAV or MP3 after the call ends. Craig bot is a popular Discord-native option that records each speaker to a separate track.
How long does Whisper take to transcribe a one-hour Discord recording?
On a modern CPU (Ryzen 5 / Core i5) with the small model, expect roughly 8–15 minutes for a one-hour recording. With a mid-range GPU (RTX 3060 or better) and the medium model, the same file transcribes in under 3 minutes. The large-v3 model on GPU handles it in 5–8 minutes with higher accuracy.
What audio format does Whisper accept for Discord transcription?
Whisper accepts WAV, MP3, FLAC, M4A, OGG, and most common audio formats because it uses ffmpeg under the hood. Discord recordings saved as MP3 or WAV work perfectly. If you record with OBS, export as WAV for the best accuracy — compressed formats can introduce artifacts that hurt transcription quality.
Conclusion
How to transcribe Discord calls boils down to two steps: record the audio with OBS or Craig, then run it through Whisper locally. That combination is free, accurate, and private — your audio never leaves your machine. For group calls, combine per-speaker Craig recordings with individual Whisper passes, or add pyannote.audio for automated diarization if you do not mind more setup. Cloud services are a reasonable alternative when you need diarization out of the box and privacy is less of a concern.
If you want to skip the command-line setup entirely, VoxBooster bundles local Whisper-grade transcription in a Windows desktop app alongside real-time voice effects, noise suppression, and a soundboard — all processing on-device, no kernel driver required. It is a practical all-in-one for anyone who is already spending a lot of time in Discord voice channels and wants their workflow to stay offline and fast.