When Critical Role turned a home D&D game into a multi-million view phenomenon, it wasn’t just the story. It was the production — every character rendered with deliberate voice work, ambient soundscapes, dramatic stings at exactly the right moment, and a cast genuinely invested in making each scene land. Replicating that energy for your own actual play stream does not require a professional recording studio. It requires the right routing, a few well-tuned presets, and a soundboard operator who knows when to fire a cue.
This guide walks through the full technical stack: how to build per-character voice profiles, route multi-player Discord audio into OBS cleanly, trigger combat soundboard stings with muscle-memory hotkeys, and use AI voice cloning for NPC cameos — all without slowing the game to a halt.
TL;DR
- Each player applies their own voice preset locally before joining Discord — no central switcher needed.
- DSP presets (pitch + formant) add under 20ms latency; use them for real-time roleplay delivery.
- AI-cloned voice profiles work for pre-planned NPC cameos with sub-300ms latency.
- Soundboard stings route as a separate OBS audio source so you can adjust levels independently.
- Critical Role’s production value comes from intentionality, not gear budget.
Why Voice Processing Elevates Actual Play
Actual play is a hybrid medium. It is part improv theatre, part tabletop game, part podcast, part Twitch stream. The technical challenge is that everyone is on Discord, mic quality varies by player, and the DM is simultaneously managing rules, NPCs, maps, and pacing. Voice processing solves specific problems in that context:
Character differentiation — six players around a digital table, all sounding like themselves, creates a flat audio landscape for viewers. Small pitch and formant shifts — even modest ones — give each character a distinct sonic identity that helps audiences follow who is speaking without looking at the screen.
NPC authority — the DM’s NPCs need to feel like a different person is talking. Matt Mercer’s ability to snap between a gruff dwarf blacksmith and a melodious archfey mid-sentence is the gold standard for actual play. Voice processing gives DMs a technical assist for that range.
Production punctuation — combat encounter music, spell effect bursts, and dramatic stings turn an audio edit from “recorded game session” into “produced show.” These are not gimmicks; they are the equivalent of a film score cueing the audience’s emotional response.
Stream-side polish — viewers notice when audio levels differ dramatically between players, when background noise bleeds through, or when the transition from roleplay to combat has zero sonic marker. Consistent audio processing across the cast raises the perceived production quality significantly.
The Actual Play Audio Routing Architecture
Before touching a single preset, understand how audio moves in a multi-player actual play setup.
The Discord-OBS chain
Each player’s audio path is:
Microphone → Voice Changer (local) → Virtual Microphone Device → Discord
The stream host’s OBS sees:
Discord (mixed output) → OBS Audio Input Capture → Stream/Recording
This means the voice processing happens before Discord, not after. Each player installs their own voice changer, applies their own character preset, and the processed audio enters the Discord mix just like normal speech. The stream host does not need to do anything special — they capture the Discord output and it already contains every player’s processed voice.
Separating soundboard audio
Soundboard sounds should route on a separate audio track in OBS, not through Discord. This gives you independent level control and keeps the stream mix clean even if someone accidentally triggers a sting mid-sentence.
Soundboard App → Separate OBS Audio Source (Game Capture or App Capture)
Set this source to 60–70% of your voice track levels as a starting baseline. Dramatic stings can be louder; ambient loops should sit behind the voices.
Monitoring the mix as the DM
During a session, the DM is the de-facto audio director. Use your audio software’s monitor output routed to headphones to hear what the stream is getting — not just what Discord is sending you. This lets you catch a player whose voice preset is clipping, or an ambient loop that has run too long.
Building Per-Character Voice Profiles
The goal is not to make you sound like a different species — it is to make your character feel consistent. A small, repeatable modification that you can switch to reliably is worth more than a dramatic effect you cannot hold across a three-hour session.
Profile design principles
Anchor to your real voice. Start with a pitch shift of ±2–4 semitones and a formant shift in the same direction. This preserves your natural resonance and emotion while moving the character into a distinct register.
Add a timbre modifier. A slight low-pass filter for older, wearier characters; a subtle brightness boost for energetic rogues; a touch of room reverb for bardic performances. Keep it light — heavy processing reads as an audio artifact, not a voice choice.
Separate speech and combat versions. A gruff fighter might speak at –2 semitones in casual scenes but benefit from a slight distortion layer during high-intensity combat moments. Save both as named presets and map them to adjacent hotkeys.
Test it on stream audio, not headphones. Voice processing that sounds great in your headphones often arrives muffled or harsh through a stream’s compressed audio. Do a five-minute Discord test with your stream host before session zero.
Comparison table: cast role to preset style
| Cast Role | Pitch Shift | Formant Shift | Timbre Layer | Notes |
|---|---|---|---|---|
| DM (neutral narrator) | 0 | 0 | None | Clear baseline; switch per-NPC |
| DM (gruff villain) | –3 to –4 st | –2 to –3 st | Light low-pass | Keep intelligible |
| DM (ethereal fey) | +2 to +3 st | +3 to +4 st | Subtle reverb | Don’t over-process |
| Fighter / Tank player | –1 to –2 st | –1 to –2 st | None needed | Subtle is fine |
| Bard / Social player | 0 to +1 st | +1 to +2 st | Light air/presence | Matches performative energy |
| Rogue / Schemer player | –1 st | 0 | Light grit | Avoid heavy distortion |
| Wizard / Scholar player | 0 to +1 st | 0 to +1 st | Slight brightness | Clear articulation priority |
| Cleric / Divine player | –1 to –2 st | –1 st | Subtle warmth | Grave but not grim |
These are starting points. Calibrate to each player’s actual voice — a player who naturally has a deep voice will need smaller downward shifts to avoid muddiness.
The DM’s NPC Toolkit: AI Voice Profiles for Cameos
The DM has the hardest audio job: voicing dozens of NPCs over a campaign while also managing game state. For recurring, high-importance NPCs — the campaign’s recurring villain, a beloved guide figure, a faction leader — an AI voice profile can anchor the character across sessions in a way that pure acting cannot always guarantee after three hours of roleplay.
Building an archetype profile
A key principle: build profiles on voice archetypes, not on specific real people. Useful archetypes for fantasy actual play:
- Deep gravel — authority figures, guards, ancient dwarves
- Melodic mid-tenor — charismatic nobles, silver-tongued merchants
- Ethereal soprano — fey creatures, oracles, celestials
- Aged rasp — ancient sages, undead entities, cursed figures
Tools like VoxBooster let you clone a custom profile trained on a short recording of your own voice in character — or with explicit consent, a collaborator’s voice — and then activate it live with sub-300ms latency. That is fast enough for natural conversational delivery.
When to use AI cloning versus DSP effects
| Scenario | Recommended Approach |
|---|---|
| Real-time improv NPC | DSP preset (faster, more flexible) |
| Recurring named villain | AI profile (consistent across sessions) |
| One-off minion or guard | DSP with minimal settings |
| Pre-recorded NPC audio drop | Either; latency irrelevant |
| Player character in combat | DSP (sub-20ms priority) |
Keep AI profiles for the NPCs that matter — overusing them dilutes the effect and increases your session setup overhead.
Soundboard Setup for Combat and Drama
A well-timed soundboard sting is one of the highest-leverage production tools in actual play streaming. Critical Role’s production team has refined this to an art form: the moment combat is called, the tone shifts — and a large part of that is audio.
Building your soundboard library
Organize sounds into four categories:
Combat stings — 2–4 second punchy cues for initiative announcements, critical hits, death saves, and dramatic reveals. Use a distinct sound per category so they are recognizable after multiple sessions.
Ambient loops — dungeon ambience, tavern chatter, forest wind, city market noise. Keep these subtle; they should be barely audible under voices. Set them to auto-loop in your soundboard software.
Spell and ability effects — fire whoosh, thunder crack, divine chime, shadow burst. Best used sparingly; one well-placed effect per combat encounter is more impactful than one per spell cast.
Transition cues — a short musical phrase that signals scene changes or time skips. A consistent transition sound trains your audience to expect a cut, reducing confusion.
Hotkey mapping for live sessions
Map your six most-used sounds to a single row of number keys or a dedicated numpad. During a session, your hands stay on the keyboard; you should not be hunting for buttons mid-combat. A layout like:
1— combat encounter start sting2— critical hit flash3— death save drumroll4— current ambient loop (toggle)5— scene transition cue6— villain theme clip
Practice the hotkeys before session one. Fumbling the soundboard live breaks immersion faster than silence would.
OBS audio routing for soundboard
In OBS:
- Add the soundboard application as an Application Audio Capture source.
- Rename it “Soundboard” to distinguish it from Discord.
- Set it to a separate audio track (Track 2) so your recording has an isolated soundboard track for editing.
- In the Audio Mixer, set its level to –6 to –9 dB relative to your voice tracks.
This setup means you can lower ambient loops without touching combat stings, and your post-session editor can strip or remix the soundboard layer independently.
Multi-Player Discord Setup: Practical Checklist
Before your first session, run through this checklist with every player:
Per player:
- Voice changer installed and character preset saved
- Virtual microphone device selected in Discord (Settings → Voice & Video → Input Device)
- Krisp noise suppression set to Low or Off (Krisp can conflict with processed voices)
- Echo cancellation disabled if using headphones (avoids double-processing)
- 30-second test clip sent to DM for level check
DM / Stream host:
- OBS has Discord output captured as a separate audio source
- Soundboard routed as its own OBS audio source
- Scene transitions set up in OBS (game map, “BRB” screen, end card)
- Stream audio monitored via headphones during session
- VoxBooster low-latency audio capture virtual mic selected as DM’s Discord input
A 15-minute pre-session audio check — everyone joins a test channel and speaks in character — saves you from discovering a broken preset at the worst moment.
OBS Scene Layout for Actual Play
The audio routing only matters if your stream layout supports it. A Critical Role-style stream typically uses:
Main scene — player cam grid (or portraits for face-cam shows) + battle map + lower-third character names. Audio: Discord + soundboard.
DM focus scene — single large DM cam + map overlay. Audio: same sources, no change needed.
Art/reveal scene — full-screen character art or location art. Audio: ambient loop + optional dramatic sting on entry.
BRB/break screen — hold music + countdown timer. Audio: music only, Discord muted.
Each scene uses the same audio sources — only the video layout changes. This keeps your audio mix consistent across transitions and avoids the common mistake of accidentally muting Discord when switching scenes.
For detailed OBS configuration, consult the OBS Studio documentation on audio mixing.
Elevating Your Actual Play Beyond the Technical Setup
The technology is only the frame. What makes Critical Role genuinely compelling — and what has made the broader actual play genre (see the Critical Role Wikipedia entry for its cultural footprint) — is collaborative investment in the fiction.
Voice processing reinforces that investment by giving each player a reliable sonic identity to inhabit. It lowers the cognitive overhead of “sounding like your character” so players can focus on being their character.
Critical Role’s official site includes production notes and behind-the-scenes content that is worth studying for production inspiration — not to replicate their exact setup, but to understand the intentionality behind their choices.
For further reading on the actual play format’s mechanics, the VoxBooster guide to voice changer setup for Discord gaming sessions covers the baseline routing in more detail. If you are new to real-time AI voice effects, how real-time voice cloning works explains the technology stack under the hood.
VoxBooster in an Actual Play Setup
For actual play specifically, a few technical properties matter more than for casual gaming:
low-latency audio capture compatibility means VoxBooster’s virtual microphone device appears natively in OBS, Discord, and any other app that uses standard Windows audio — no third-party virtual cable required, nothing extra to install on each player’s machine.
Sub-20ms DSP processing keeps DSP-based character presets imperceptible latency-wise, so player delivery feels natural rather than slightly behind.
Sub-300ms AI cloning hits the threshold for usable live NPC performance without the uncanny delay that longer latency profiles produce.
Soundboard hotkeys run inside the same application so DMs can manage voice preset switching and soundboard triggers from one interface without alt-tabbing mid-combat.
VoxBooster runs on Windows 10 and 11, requires no kernel driver installation, and includes a free trial. Paid plans start at $6.99/month.
FAQ
The most common questions from actual play streamers building their first voice setup are answered in the frontmatter above. The short version: start simple — one preset per character, six soundboard sounds, clean Discord routing — and layer complexity in as you and your cast get comfortable with the tools. A two-hour session where everyone’s voice is clear and the soundboard fires on cue is a better stream than a technically elaborate production that falls apart in the first combat encounter.
Build the session zero audio check into your campaign prep the same way you build character sheets and session notes. It will pay off every episode after that.