Fiction podcasts, audio drama and narrated content have a chronic production problem: each character needs a different voice. Traditionally that means hiring a cast, scheduling sessions, syncing performances. In 2026, you can do it all by yourself with professional production quality.
This is the workflow independent creators are using on Spotify and Apple Podcasts.
Two ways to record
Real-time mode. You speak in character, the clone applied in real time comes out in the chosen voice’s timbre. Pro: fluid performance, natural reactions. Con: if you switch characters mid-scene, you have to stop, change voice in VoxBooster, record the next line.
Offline mode. You record your raw voice (normal, yours) across all lines, with character markers. Then you pass each segment through VoxBooster’s offline processing, applying a different voice to each. Pro: clean edit, single take. Con: you have to decide the cast before you start editing.
In practice, the hybrid flow wins: record everything in your voice, mark characters on the Reaper/Audacity timeline, export the segments, run each through VoxBooster offline. 15 minutes of editing save 3 hours of re-recording.
Recording setup
You don’t need an expensive setup:
- Dynamic microphone (SM7B if you can, Shure MV7 or Samson Q2U if you want value) — dynamic rejects room noise better than a condenser.
- Minimal acoustic treatment — four foam panels on the nearest wall, carpet on the floor. No booth needed.
- Audio interface or USB mixer — avoids the direct-USB-mic noise floor on long recordings.
- Reaper or Audacity for multi-track editing.
Picking the voices
The VoxBooster library has categories that work well for fiction:
- Authoritative narrator for protagonist voice narrating in first person
- Gentle female voice for a counselor / mentor character
- Deep, rough voice for an antagonist or tough character
- Young animated voice for comic relief / sidekick
- Radio announcer voice for scenes simulating in-world broadcasts
- Alien / demonic voice for a supernatural entity
Avoid picking two voices from the same type — the listener has to distinguish characters by timbre alone. If two voices are both “deep narrator,” it becomes confusing.
Step-by-step workflow
- Write the script with markers:
[JOHN] We need to get out of here. [MARY] Not yet. - Record everything in your raw voice, performing each character as much as possible. Your performance (rhythm, emotion, pause) survives in the clone — only the timbre is replaced.
- Import to Reaper, split each line into a region named for the character.
- Export each region as a separate file (
john_01.wav,mary_01.wav). - In VoxBooster, open Process File, pick John’s voice, drag all
john_*.wavfiles. Click process. You getjohn_01_clone.wavetc. - Same for Mary, Peter, narrator.
- Back in Reaper, replace each region with its cloned version.
- Normal mix: light compressor per voice, reverb for ambience, final normalization at -16 LUFS for podcast.
Performance tips
- Act each character with your body. Change posture, breathing, energy. That leaks into your speech and survives in the clone.
- Don’t try to imitate the final voice. If you’ve already picked a deep voice for the antagonist, you don’t need to force depth in the recording — the model handles timbre. Focus on intention.
- Keep consistent mic distance. Voice clone is sensitive to proximity variation; speaking far and close in the same file generates artifacts.