NVIDIA Maxine Voice: SDK Guide, RTX Noise Suppression & Real-Time Audio
NVIDIA Maxine audio technology represents one of the most significant GPU-accelerated leaps in consumer audio processing. What started as RTX Voice — a standalone app that blew streamers’ minds in 2020 by removing mechanical keyboard clatter with a GPU model — has matured into the Maxine Audio Effects SDK: a full developer toolkit for building apps with real-time denoising, room echo cancellation, and acoustic beamforming baked in. This guide covers how the technology works, how to set it up, and how to layer it with a real-time voice changer for a complete broadcast-quality audio chain on Windows.
TL;DR
- NVIDIA Maxine Audio Effects SDK is a free developer toolkit with GPU-accelerated noise suppression, echo cancellation, and denoising at 48 kHz
- RTX Voice was the consumer predecessor; NVIDIA Broadcast and Maxine SDK are the current forms
- Requires RTX 20-series or newer (Tensor Cores required for neural inference)
- Latency is 10–20 ms for a single effect pass — imperceptible in conversation
- Best workflow: physical mic → Maxine denoising → voice changer → virtual mic output to Discord/OBS
- VoxBooster integrates cleanly after Maxine in the audio chain, no virtual cable required
What Is NVIDIA Maxine Audio Effects SDK?
NVIDIA Maxine Audio Effects SDK is a GPU-accelerated set of APIs that apply deep learning–based audio enhancement to real-time audio streams. It is not a consumer application — it is a developer toolkit that software vendors, indie developers, and researchers use to add studio-quality denoising and echo removal to their own applications without building those models from scratch.
The SDK ships three core audio effects:
- Noise Suppression — removes background sounds (fans, keyboards, street noise, HVAC) from a microphone signal using a neural network trained on thousands of noise types
- Room Echo Cancellation — identifies and removes acoustic reflections caused by speakers playing audio back into the room (the cause of echo on laptop mics during calls)
- Acoustic Echo Cancellation (AEC) — a lower-latency variant of echo cancellation tuned for headphone+speaker setups
The underlying architecture uses convolutional neural networks that run on RTX GPU Tensor Cores, which is why the processing adds only 10–20 ms of latency instead of the 80–150 ms you would expect from a CPU-based deep learning pipeline.
More detailed technical documentation is available on the NVIDIA Developer site.
From RTX Voice to Maxine SDK: A Brief History
To understand the current state of the technology, the timeline matters.
2020 — RTX Voice launch. NVIDIA released RTX Voice as a free standalone app. It created a virtual microphone that ran your real mic signal through a deep learning denoising model on your RTX GPU. The results were immediately impressive — mechanical keyboard noise, HVAC rumble, and coffee-shop ambiance vanished with minimal voice coloration. The catch was an install requirement for RTX GPUs only (though community patches briefly enabled it on GTX cards by bypassing the check).
2021 — NVIDIA Broadcast. RTX Voice and RTX Greenscreen were merged into a single app called NVIDIA Broadcast, which added a noise-free background removal feature and eye contact correction for webcams. The audio denoising model was updated with better voice preservation at higher noise levels.
2022–2024 — Maxine SDK maturation. NVIDIA packaged the same models into the Maxine Audio Effects SDK for developers, versioned separately from the consumer app. The SDK exposed more parameters — effect strength, frequency weighting, model selection — giving developers control that the GUI app intentionally simplified away.
2025–2026 — Integration era. Third-party apps, DAWs, and voice software began integrating Maxine directly. The NVAFX API (the core of Maxine Audio Effects) is now available as a plugin format and as a direct C++ / Python API.
| Product | Audience | Interface | Control Level |
|---|---|---|---|
| RTX Voice (legacy) | Consumers | GUI app | None — one click |
| NVIDIA Broadcast | Consumers | GUI app | Minimal |
| Maxine Audio Effects SDK | Developers | C++ / Python API | Full |
| Third-party integrations | End users via apps | Varies | Varies |
How Maxine Noise Suppression Works Under the Hood
The noise suppression model is a recurrent neural network (RNN) architecture trained on a large corpus of clean speech paired with diverse noise backgrounds. At runtime it processes audio in short frames — typically 10 ms windows — and predicts a noise mask for each frequency bin. Frequencies dominated by noise get attenuated; frequencies dominated by voice pass through.
This is conceptually similar to spectral subtraction (the classical approach used by tools like Audacity’s built-in Noise Reduction), but the neural approach does two things differently:
- It generalizes to novel noise types. Classical spectral subtraction needs a noise profile captured in advance. The Maxine model learned what speech looks like and suppresses whatever does not match — even noise it has never specifically seen.
- It preserves voice characteristics. The model is trained to leave the spectral envelope of the human voice largely untouched, which is why voices processed through RTX Voice / Maxine do not develop the “underwater” or “watery” artifacts that aggressive classical noise reduction produces.
The trade-off is GPU dependency. The model requires the matrix multiplication throughput of Tensor Cores to run at real-time latency. A CPU running the same model takes 60–120 ms per frame — too slow for conversational use.
Supported GPU Tiers
| GPU Generation | Tensor Cores | Maxine Support | Notes |
|---|---|---|---|
| GTX 10/16 series | No | Not supported | No Tensor Cores |
| RTX 20 series (Turing) | Yes (1st gen) | Full support | Minimum requirement |
| RTX 30 series (Ampere) | Yes (2nd gen) | Full support | Recommended for streaming |
| RTX 40 series (Ada Lovelace) | Yes (4th gen) | Full support | Fastest inference |
| RTX 50 series (Blackwell) | Yes (5th gen) | Full support | 2025+ cards |
Room Echo Cancellation: The Underrated Feature
Noise suppression gets most of the attention, but room echo cancellation is equally valuable for many setups — particularly open-desk environments where desktop speakers are used instead of headphones.
Room echo occurs when your speaker output (game audio, music, the other person’s voice) bleeds back into your microphone. The microphone hears both your voice and the room’s acoustic reflection of what the speaker just played. This creates the familiar “hearing yourself twice” or “hollowness” problem on calls, and it introduces artifacts in voice changers that expect a clean vocal signal.
The Maxine AEC effect solves this by using a reference signal — the audio that was played through your speaker — to predict what portion of the microphone input is acoustic reflection and subtract it. This is a well-established signal processing technique (NLMS adaptive filtering at its core), but Maxine’s neural enhancement reduces the residual echo that adaptive filters leave behind at high speaker levels.
When to use AEC vs. simple noise suppression:
- Use noise suppression when the problem is background environmental sounds (fan, keyboard, street)
- Use AEC when the problem is acoustic feedback from your own speakers entering the mic
- Use both in combination for an open-room broadcast setup
Setting Up NVIDIA Broadcast (Consumer Path)
If you are a streamer or content creator and do not want to compile an SDK, NVIDIA Broadcast is the right tool. It installs Maxine’s denoising under the hood and exposes it through a GUI.
Requirements:
- Windows 10 or 11
- RTX 20-series or newer GPU
- Driver version 456.38 or later (most users are already far past this)
Setup steps:
- Download NVIDIA Broadcast from nvidia.com/broadcast
- Install and launch. The app shows three panels: Camera, Microphone, and Speaker.
- Under Microphone, select your physical mic as the input.
- Enable Noise Removal and optionally Room Echo Removal.
- Set Output to “NVIDIA RTX Voice (Microphone)” — this creates a virtual microphone device.
- In Discord, OBS, or any other app, select “NVIDIA RTX Voice (Microphone)” as the input device.
The virtual microphone created by Broadcast outputs clean, denoised audio that any other app can receive. This is the same virtual device pattern used by voice changers like VoxBooster — and it means you can chain the two.
Setting Up the Maxine Audio Effects SDK (Developer Path)
For developers building custom applications, the SDK offers direct API access to the same models.
Prerequisites:
- CUDA Toolkit 11.x or 12.x
- RTX GPU with driver ≥456.38
- NVIDIA Maxine SDK downloaded from NGC Developer Portal
Core API workflow (C++ pseudocode overview):
NvAFX_CreateEffect(NVAFX_EFFECT_DENOISE, &handle)
NvAFX_SetU32(handle, NVAFX_PARAM_NUM_CHANNELS, 1)
NvAFX_SetU32(handle, NVAFX_PARAM_SAMPLE_RATE, 48000)
NvAFX_SetString(handle, NVAFX_PARAM_MODEL_PATH, "denoiser_48k.trtpkg")
NvAFX_Load(handle)
// Per-frame loop:
NvAFX_Run(handle, input_buffer, output_buffer, num_samples)
NvAFX_DestroyEffect(handle)
The model files (.trtpkg) are TensorRT-optimized inference graphs. They are bundled with the SDK download and must be present at the path you specify. The SDK handles GPU memory allocation and CUDA stream management internally.
Python bindings are available via the unofficial nvafx-python wrapper, which makes it accessible for rapid prototyping without writing full C++ applications.
Practical frame sizes:
- Noise suppression: 480 samples at 48 kHz = 10 ms per frame
- Echo cancellation: 160 samples at 16 kHz = 10 ms per frame (requires downsampling if your chain runs at 48 kHz)
The SDK documentation recommends double-buffering the input and output frames to smooth over processing jitter, especially when the audio pipeline runs on the same GPU as a game or screen capture.
Integrating Maxine with a Real-Time Voice Changer
The most powerful use case for desktop users is combining Maxine’s denoising with a voice changer that handles pitch shifting, effects, or AI voice conversion. Here is how the audio chain works:
Physical Mic
↓
NVIDIA Broadcast virtual mic (denoised, clean signal)
↓
VoxBooster (pitch shift / effects / AI voice conversion)
↓
VoxBooster virtual mic output
↓
Discord / OBS / Game / Browser
This chain works because each tool exposes a virtual microphone that the next tool in the chain can consume as its input device. NVIDIA Broadcast outputs “NVIDIA RTX Voice (Microphone)”; VoxBooster reads that as its source mic.
Why the order matters: Noise suppression must come before the voice changer, not after. If you run the voice changer first and then denoise, the neural denoiser will treat some voice-effect artifacts as “noise” and attenuate them, degrading your effect quality. Run the chain clean-in → denoise → transform → output.
Latency budget at each stage:
| Stage | Added Latency |
|---|---|
| Physical mic to driver | 2–5 ms |
| NVIDIA Broadcast denoising | 10–20 ms |
| VoxBooster effects mode | 5–15 ms |
| VoxBooster AI voice mode | 200–350 ms |
| Virtual mic to app | 2–5 ms |
| Total (effects mode) | ~20–45 ms |
| Total (AI voice mode) | ~215–385 ms |
Effects mode latency is imperceptible in conversation. AI voice mode latency (~250 ms median) is similar to a transatlantic VoIP call — noticeable but workable for most streaming scenarios. For fast-paced competitive gaming with voice communication, effects mode is recommended.
For more on setting up your audio chain for streaming, see the guide on voice changers for content creators.
Using NVIDIA Maxine Audio on Discord
Discord has its own built-in noise suppression powered by Krisp, but Maxine-quality denoising is perceptibly better at high noise levels — particularly mechanical keyboard noise and room HVAC. Running Maxine upstream of Discord’s input lets you use Maxine’s model while still benefiting from Discord’s echo cancellation on the app layer.
Recommended setup:
- Enable NVIDIA Broadcast denoising on your physical mic.
- In Discord Settings → Voice & Video, set Input Device to “NVIDIA RTX Voice (Microphone).”
- Under Voice Processing, disable Discord’s built-in Noise Suppression (it adds latency and double-processing artifacts) but keep Echo Cancellation on.
- Optionally route through VoxBooster between Broadcast and Discord for voice effects.
One important consideration: Discord may conflict if you also have a third-party noise suppressor like Krisp running in its own plugin slot. Check our detailed guide on voice changer and Krisp conflicts on Discord for troubleshooting steps.
RTX Voice for Streaming: OBS Integration
For OBS Studio users, the cleanest integration uses NVIDIA Broadcast as the microphone device and adds no OBS-side noise filter at all — letting the GPU handle it upstream.
OBS Audio Setup:
- In OBS → Settings → Audio, set Mic/Auxiliary Audio to “NVIDIA RTX Voice (Microphone).”
- In the audio mixer, right-click your mic source → Filters.
- Remove any existing Noise Suppression filter if you previously added one (double-processing degrades quality).
- Optionally add a Compressor filter and a Gain filter for level control — these are fine to keep after Maxine.
For streamers who also want voice effects or AI voice cloning live during their broadcast, add VoxBooster to the chain before OBS. OBS then receives the Maxine-denoised + VoxBooster-transformed output through VoxBooster’s virtual microphone. This is the same approach covered in detail in setting up a voice changer for Discord.
Voice Cloning and AI Voice Conversion After Maxine
A quieter but important use case: feeding Maxine-cleaned audio into an AI voice conversion pipeline. If you are creating voiceover content with an AI-cloned voice, the quality of the input audio directly affects the conversion output. Noisy input produces noisy clones.
The standard practice for building a voice clone dataset is:
- Record source audio (your voice, or a licensed voice actor’s voice)
- Run Maxine noise suppression offline at maximum effect strength — quality matters more than latency here
- Segment into 5–15 second clips
- Feed the clean segments into the training pipeline
The resulting voice model will have noticeably cleaner high-frequency detail and fewer noise-floor artifacts than one trained on raw microphone recordings in a typical home environment. This matters especially for consonants (fricatives like ‘s’, ‘f’, ‘sh’) where noise easily blurs the spectral fine structure the model needs to learn.
For a deeper look at AI voice cloning workflows and how they differ from real-time voice changers, see our voice cloning for voiceover guide.
Troubleshooting Common Maxine and RTX Voice Issues
“NVIDIA RTX Voice virtual mic not showing in device list”
Restart the Windows Audio service (Win+R → services.msc → Windows Audio → Restart). NVIDIA Broadcast sometimes fails to register its virtual device after a system update. If the problem persists, uninstall and reinstall Broadcast.
“Effect seems to have no impact on keyboard noise” Check that Effect Intensity is at 100% in the Broadcast UI. Some users accidentally leave it at 50%. Also verify your physical mic is actually selected as the Broadcast input — not the RTX Voice mic itself (which would create a feedback loop).
“Voice sounds hollow or has a ‘swimming’ quality”
The denoising model is over-aggressively suppressing audio in a very quiet room. Reduce Effect Intensity to 70–80%. Alternatively, use the Maxine SDK directly and lower the NVAFX_PARAM_INTENSITY parameter.
“Latency increased dramatically after enabling Broadcast” Check that your GPU driver is up to date. Older drivers (pre-520) had a bug where Maxine processed in synchronous CPU-stall mode instead of async GPU mode, adding 60–80 ms of unnecessary latency.
“VoxBooster and NVIDIA Broadcast do not chain correctly” Ensure VoxBooster’s input device is set to “NVIDIA RTX Voice (Microphone)” and not your physical mic. If both are set to the physical mic, they process in parallel rather than in series — you will get the effects but not the denoising benefit. Also confirm that Windows Sound settings have not reverted the default microphone to the physical device.
Comparing NVIDIA Maxine to Other Noise Suppression Solutions
The noise suppression landscape has several competing approaches. Maxine is not the only strong option, but the comparison reveals where it genuinely stands out.
| Solution | Technology | Latency | GPU Required | Cost | Best For |
|---|---|---|---|---|---|
| NVIDIA Maxine / Broadcast | Neural (Tensor Core) | 10–20 ms | RTX required | Free | RTX GPU owners |
| Krisp | Neural (CPU) | 20–40 ms | No | Free / paid tiers | Non-RTX users |
| Discord built-in | Neural (CPU/cloud) | 20–50 ms | No | Free (Discord) | Discord only |
| Adobe Audition Denoise | Spectral neural | Offline only | No | Paid (Creative Cloud) | Post-production |
| RNNoise | Neural (CPU, open source) | ~10 ms | No | Free (open source) | Developers on any GPU |
| Audacity Noise Reduction | Spectral subtraction | Offline only | No | Free | Offline editing |
Maxine’s advantage is GPU-accelerated latency combined with a model trained on a vastly larger dataset than Krisp’s consumer tier. For streamers with RTX cards, Maxine or NVIDIA Broadcast is typically the best free choice. Non-RTX users should look at Krisp — the CPU-based model has improved significantly and runs well on modern CPUs. We cover Krisp’s integration workflow in more detail in our voice changer Krisp integration guide.
Maxine Audio SDK vs. NVIDIA Broadcast: Which Should You Use?
If you are an end user who wants noise suppression with no coding required, use NVIDIA Broadcast. It is the consumer wrapper around the same underlying models, gets updated automatically, and integrates with all major apps through a virtual mic.
If you are a developer building an application that needs audio enhancement — a voice chat app, a streaming tool, a creative software product — the Maxine SDK is the right choice. It gives you:
- Programmatic control over effect intensity
- Access to model selection (multiple model quality tiers)
- The ability to embed denoising without requiring users to install a separate consumer app
- Frame-level control for integration with custom audio pipelines
The SDK is also the right choice for processing offline audio files in batch — for training voice models, cleaning podcast recordings, or preprocessing audio datasets where a GUI workflow would be too slow.
Conclusion
NVIDIA Maxine Audio Effects SDK and RTX Voice represent a genuine step change in accessible, GPU-accelerated audio processing. What used to require a hardware DSP unit or an expensive recording booth can now run in 10–20 ms on a mid-range gaming GPU, removing noise that classical algorithms never reliably eliminated.
For most Windows users with an RTX card, the practical setup is straightforward: install NVIDIA Broadcast, enable noise suppression on your mic, and let every other app receive the cleaned virtual mic signal. If you also want real-time voice effects, pitch shifting, or AI voice conversion layered on top, tools like VoxBooster slot neatly into that chain — consuming the Broadcast virtual mic as input and publishing their own virtual mic as output, all without a kernel driver or administrator-level audio routing software. The result is a broadcast-quality audio chain from a consumer desktop, running end-to-end at under 50 ms latency in effects mode.
For a complete overview of how to set up a streaming audio chain with voice effects, see the guide on voice changers for Discord or the broader voice changer for streaming guide.