Phasmophobia gives voice more mechanical weight than nearly any other game. The ghost listens. It responds. Say the wrong word during a Spirit Box session and you’ve confirmed an entity type — or drawn attention to yourself at the worst possible moment. That dual function makes a voice changer genuinely useful for both casual ghost hunters and streamers building a horror persona.
This guide covers low-latency audio capture routing that preserves speech recognition, preset design for investigator persona versus content shriek moments, and how to avoid triggering accidental Yes/No ghost responses.
TL;DR
- Route your voice changer through low-latency audio capture — set the virtual device as your default mic in both Windows and Phasmophobia audio settings
- Keep pitch shifts within ±4 semitones and reverb minimal to preserve Phasmophobia’s speech recognition
- Build two presets: calm investigator (slight low pitch, clean) and scared (raised pitch, light tremolo)
- Avoid saying “yes” or “no” out loud while the Spirit Box is active — ghost detection doesn’t care which voice preset you’re using
- Test phrase recognition in the tutorial or pre-hunt lobby before a real investigation
- low-latency audio capture-based voice changers don’t interact with game memory — no anti-cheat concerns
How Phasmophobia Uses Your Voice
Phasmophobia uses Windows Speech Recognition to parse words you say aloud. It maintains a dictionary of trigger phrases — ghost names, investigator commands, Spirit Box questions. When you say a recognized phrase and conditions are met, the game fires the associated event.
This layer operates independently of game audio output — it reads your microphone input directly. The voice other players hear (your transformed output) is the same signal the speech engine processes.
Practical consequence: if your transformation degrades phoneme clarity below the parsing threshold, ghost interactions stop working. “What is your age?” becomes noise. Spirit Box sessions go silent even when the ghost should respond.
Second consequence: the ghost doesn’t know about your voice changer. Accidentally mutter “yes” while a Spirit Box is active and the ghost responds as if you asked a direct question.
low-latency audio capture Routing: The Correct Setup
Getting the audio chain right is the first step. The goal is a single virtual microphone device that both Phasmophobia’s in-game audio settings and Windows Speech Recognition draw from.
Step 1: Install your voice changer and verify the virtual device appears.
Open Windows Settings → System → Sound → More sound settings → Recording tab. A new device should appear — the virtual low-latency audio capture microphone your voice changer registers. If missing, relaunch with administrator rights.
Step 2: Set the virtual device as the Windows default.
Right-click → Set as Default Device, and also Set as Default Communication Device. Both matter: Windows Speech Recognition uses the recording default; some audio routing checks the communication default.
Step 3: Configure Phasmophobia to use the virtual device explicitly.
Settings → Audio → Microphone: select the virtual device by name rather than “Default.” This prevents Phasmophobia from picking up a different device if Windows reassigns after a reboot.
Step 4: Set the voice changer’s input to your real microphone.
Chain: physical mic → voice changer → virtual low-latency audio capture device → Phasmophobia.
Step 5: Test in the tutorial map.
Approach the Spirit Box and say “What is your name?” If the ghost responds, speech recognition is working through your preset. Test a few phrases before your first real investigation.
Preset Design: Two-Preset Strategy
A two-preset setup covers the core Phasmophobia use cases cleanly: functional investigation communication and reactive horror content moments.
Preset 1: Calm Investigator
The goal here is professional ghost-hunter energy. Authoritative, slightly measured, unfazed. This is the voice you use for most of the investigation — coordinating with your team, calling out evidence, interrogating the ghost through the Spirit Box.
Target settings:
- Pitch: -1 to -2 semitones (slightly lower than your natural voice, adds gravity without sounding processed)
- Reverb: minimal or none — keep it dry so the speech engine and your teammates can hear you clearly
- Noise gate: active, tight threshold — eliminates ambient sounds that could accidentally trigger ghost responses
- No distortion, no robotic filter
This preset should be nearly indistinguishable from your natural voice to casual listeners. The mild pitch drop is enough to signal “experienced investigator” persona without over-processing.
Preset 2: Scared / Reactive
For content moments — the shriek when a ghost slams a door, the genuine (or performed) panic when hunt music starts. This preset is for streamer reaction content, not for functional ghost communication.
Target settings:
- Pitch: +3 to +5 semitones (raises pitch to convey alarm and surprise)
- Light tremolo or vibrato (adds a wavering quality that reads as genuine fright)
- Reverb: still minimal — too much reverb makes the sound incoherent and less funny on stream
- Keep this under 300ms latency so it syncs with your actual facial reaction
Switch to this preset only during dedicated reaction moments, not during active Spirit Box sessions. The raised pitch and tremolo can distort phonemes enough to cause false recognitions on certain words.
The Yes/No Problem: Staying Safe During Spirit Box
This is the part most voice changer guides skip, and it’s the interaction that catches people off guard most often.
Phasmophobia’s ghost detection listens for “yes” and “no” as responses during Spirit Box interactions. These are common words. They appear in casual conversation, in questions, in reactions. “Yes, go left.” “No, not that room.” “I don’t know, maybe?” — depending on parsing, any of those can register.
The relevant rules:
Use push-to-talk, not open mic, during Spirit Box sessions. Open mic with a voice changer is higher risk because the voice changer may introduce short tails on audio (particularly with reverb enabled) that keep the “listening” state open longer than your actual speech. Push-to-talk gives you explicit control over when the game can hear you.
Don’t say “yes” or “no” out loud while Spirit Box is active. Obvious in principle, harder in practice during co-op when you’re answering teammates in normal conversation. Develop a habit of replacing them during Spirit Box phases — “correct” instead of yes, “negative” or a head shake instead of no.
Disable reverb on your investigator preset. Reverb tails can sometimes create a double-trigger effect where the beginning of a new word matches the tail of the previous one in unexpected combinations. Clean, dry audio is more predictable for the speech engine.
Test your specific preset with the problematic words. In a test map, say “yes” and “no” clearly through each preset. If the ghost responds unexpectedly during those tests, your preset’s processing is creating false positives. Reduce reverb and moderate the pitch shift until the false triggers stop.
Speech Recognition Limits: What Breaks It
The safe zone for reliable ghost interaction: pitch within ±4 semitones, dry or light reverb only, no robotic or vocoder effects. A quick reference:
| Processing | Effect on recognition |
|---|---|
| Pitch ±1–3 semitones | Transparent |
| Pitch ±4–6 semitones | Minor degradation on fricatives |
| Pitch ±7+ semitones | Unreliable |
| Light reverb | Minimal effect |
| Heavy reverb | Phoneme smearing, false triggers |
| Robotic / vocoder | Breaks entirely |
| Extreme distortion | Breaks entirely |
If you want a more extreme transformation for aesthetics, save it for non-Spirit-Box phases.
Streamer Persona Strategy
Phasmophobia is built for content in a way few horror games are. The fear ramp — from calm evidence gathering to full hunt panic — is a natural narrative arc. A voice changer extends the performance layer of that arc.
The investigator-to-screamer arc is the most effective structure for streamer content. Start every investigation with your calm investigator preset — measured, professional, maybe a slight vocal affectation that signals “this is a character.” When the hunt triggers or you get a significant jump scare, hot-swap to the scared preset and let the reaction play out unfiltered. The contrast between the two is the content moment.
Consistent persona builds audience expectation. If viewers watch five Phasmophobia streams and hear the same investigator voice in the methodical phases and the same scared voice during hunts, the calm preset becomes a kind of dramatic irony — they know something you (the character) don’t. That anticipation is worth more than a random assortment of voice effects.
VoxBooster’s multiple presets and sub-300ms latency make this practical rather than theoretical. You can store both presets named and bound to a hotkey, switch within a second, and the latency stays low enough that the transformed voice syncs with visible facial reactions on camera. The AI cloning option lets you design a genuinely distinct investigator voice rather than just a pitch-shifted version of your natural voice — a different formant pattern, a different vocal character.
VoxBooster runs via low-latency audio capture with no kernel driver installation, which means it doesn’t interact with Phasmophobia’s process in any way and doesn’t require a system restart to set up.
Co-op Considerations
Brief your team first. Co-op partners who know your natural voice will be disoriented. A quick “I’m on a voice preset tonight” avoids confusion during fast communication under hunt pressure.
VoIP stacking. If you’re using Discord alongside Phasmophobia’s in-game voice, low-latency audio capture routing covers both — any application reading from the Windows default device gets the transformed signal. No extra routing needed.
Keep your investigator preset clean. Heavily processed voices are harder to parse under time pressure. Reserve extreme effects for solo moments; clarity matters more than aesthetics when someone is calling out ghost type at the last second.
Common Issues
Ghost doesn’t respond to Spirit Box questions.
The speech engine isn’t parsing your voice through the preset. Switch to your cleanest preset (minimal processing), say “What is your name?” slowly and clearly. If it works with minimal processing but not your normal preset, the preset is over-processing. Reduce pitch shift and reverb until recognition returns.
Voice cuts out mid-investigation.
Buffer underrun from audio processing under load. Phasmophobia is more GPU-intensive on certain maps (Asylum, Maple Lodge). If your voice changer uses AI inference, GPU contention can cause audio dropout. Switch to DSP-only presets on heavy maps, or lower in-game graphics settings slightly to free up GPU headroom.
Teammates hear echo or doubled voice.
Windows is monitoring the virtual device through your speakers or headphones. In Sound Settings → Recording → Properties for the virtual device, disable “Listen to this device.” Also check that Windows Stereo Mix is disabled.
Scared preset sounds comical rather than frightened.
Too much pitch shift with no tremolo grounds it as a chipmunk effect rather than genuine fear. Reduce pitch shift to +3 semitones, add a very light tremolo (slow rate, low depth), and let your actual vocal delivery carry the performance. The preset should enhance the reaction, not replace it.
Quick Setup Checklist
Before your next ghost investigation:
- Voice changer installed and virtual low-latency audio capture device visible in Windows Sound settings
- Virtual device set as Default Recording Device and Default Communication Device in Windows
- Phasmophobia audio settings pointing explicitly to the virtual device
- Calm investigator preset configured (pitch -1 to -2, minimal reverb)
- Scared preset configured (pitch +3 to +5, light tremolo)
- Both presets bound to accessible hotkeys
- Spirit Box phrase recognition tested in tutorial map
- Push-to-talk configured in Phasmophobia (not open mic)
- Told co-op teammates about the voice setup
The game is ready to play whenever you are. The ghost doesn’t care what you sound like — but your audience does.