Voice Meeting Notes with Whisper on Windows

If every meeting ends with an email chain asking “what did we actually decide?”, the problem is not the meeting — it is the lack of a reliable transcript. Cloud transcription services solve this partially, but they require uploading your call audio to a third-party server. For legal, compliance, or plain privacy reasons that is not always acceptable.

This guide shows you how to build a voice meeting notes workflow entirely on your Windows PC: capture the meeting audio using low-latency audio capture loopback, run it through OpenAI’s Whisper model locally, and automatically extract a Markdown summary with decisions and action items. No cloud upload. No subscription. Processing happens on your machine.

TL;DR

Step	Tool	Time
Capture audio	FFmpeg + low-latency audio capture loopback	Live
Transcribe	Whisper (medium.en)	~4 min / 1 hr meeting
Extract actions	Python + local LLM or paste to AI	~2 min
Output	Markdown `.md` file	Immediate

Why Local Transcription Beats Cloud for Meetings

Most cloud transcription services — Otter.ai, Fireflies, Zoom’s built-in AI Notes — work by sending your audio to remote servers where it is processed and often stored for model training. For personal catch-up calls that is fine. For calls containing client names, financial projections, medical information, or legal discussion, it is not.

Running Whisper locally means the audio file never leaves the machine. There is no API key tied to your company account, no retention policy to read, and no possibility of a third-party breach exposing your call content. The transcript and summary live wherever you save them.

There is also a cost argument. Cloud transcription at scale — 100 hours of meetings per month across a team — costs $40–$200 per month per user on most platforms. Local inference on a GPU you already own costs nothing per transcript after setup.

Recording or transcribing a meeting without participant consent is illegal in many jurisdictions, including many US states (two-party consent laws), the EU (GDPR Article 6), and others worldwide.

Before you transcribe any meeting:

Announce clearly at the start: “I’m capturing audio for local transcription to produce meeting notes.”
Give participants the option to opt out or speak off the record.
Check your company’s call-recording policy — many require IT or legal approval.
Store transcripts securely and apply the same data handling rules as other confidential documents.

This article is a technical guide. It is not legal advice.

What You Need

Windows 10 or 11 — low-latency audio capture loopback is available on both
Python 3.10+ — from python.org or winget
FFmpeg — for audio capture from the loopback device
openai-whisper or faster-whisper — the transcription engine
NVIDIA GPU (optional but recommended) — RTX 2060 or better for fast inference; CPU works too
A meeting app: Zoom, Microsoft Teams, Google Meet, or any audio-producing application

Step 1 — Identify Your low-latency audio capture Loopback Device

low-latency audio capture loopback captures whatever Windows plays through your output device — the same audio you hear in your headphones. No driver installation is required; it is part of the Windows audio stack since Vista.

Open a terminal and run:

ffmpeg -list_devices true -f dshow -i dummy 2>&1 | findstr /i "audio"

You will see output like:

"Speakers (Realtek High Definition Audio)" (audio)
"Headphones (USB Audio Device)" (audio)

Note the exact name of your active output device. For loopback capture, append (loopback) to the device name when you use it with FFmpeg.

Alternatively, use Python to list devices:

import sounddevice as sd
print(sd.query_devices())

Look for devices with (loopback) in the name or host API low-latency audio capture.

Step 2 — Record the Meeting Audio

Start your Zoom, Teams, or Meet call. Before the main content begins, start FFmpeg in a separate terminal:

ffmpeg -f dshow -i audio="Speakers (Realtek High Definition Audio) (loopback)" \
  -ar 16000 -ac 1 -c:a pcm_s16le \
  meeting_2026-06-12.wav

Key flags:

-ar 16000 — Whisper’s native sample rate; no resampling needed
-ac 1 — mono; reduces file size and matches Whisper’s expected input
-c:a pcm_s16le — uncompressed WAV for best accuracy

Stop recording when the meeting ends with Ctrl+C. A 1-hour meeting at these settings produces roughly 115 MB.

Tip: If your audio quality is poor due to background noise, running VoxBooster’s noise suppression on your microphone channel before the call keeps your own voice clean in the capture. The low-latency audio capture loopback captures the mixed output, so other participants’ audio benefits from their own platforms’ noise processing.

Step 3 — Install Whisper

If you have not installed Whisper yet:

pip install openai-whisper
# For faster CPU/GPU inference:
pip install faster-whisper

For GPU acceleration (NVIDIA), also install:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Check your CUDA version first with nvidia-smi and match the cu version accordingly.

Step 4 — Transcribe the Recording

Using openai-whisper (CLI)

whisper meeting_2026-06-12.wav --model medium.en --output_format txt --output_dir ./transcripts

This saves a .txt file and a .srt subtitle file. The medium.en model is English-only, which is faster and more accurate for English meetings than the multilingual medium.

Using faster-whisper (Python script)

from faster_whisper import WhisperModel

model = WhisperModel("medium.en", device="cuda", compute_type="float16")

segments, info = model.transcribe("meeting_2026-06-12.wav", beam_size=5)

with open("transcript.txt", "w", encoding="utf-8") as f:
    for segment in segments:
        timestamp = f"[{segment.start:.1f}s]"
        f.write(f"{timestamp} {segment.text.strip()}\n")

print("Transcription complete.")

faster-whisper uses CTranslate2 under the hood and is 2–4× faster than the original on the same hardware.

Step 5 — Extract Action Items into Markdown

Raw transcripts are walls of text. The useful artifact is a structured summary: decisions made, tasks assigned, and open questions. Here is a simple Python script that uses Ollama (local LLM) to produce one:

import subprocess
import sys

transcript_path = sys.argv[1]

with open(transcript_path, "r", encoding="utf-8") as f:
    transcript = f.read()

prompt = f"""You are a meeting notes assistant. Given the transcript below, produce a Markdown document with:
1. **Meeting Summary** (3-5 sentences)
2. **Decisions Made** (bulleted list)
3. **Action Items** (bulleted list with owner and deadline if mentioned)
4. **Open Questions** (bulleted list)

Transcript:
{transcript}
"""

result = subprocess.run(
    ["ollama", "run", "llama3"],
    input=prompt,
    capture_output=True,
    text=True,
    encoding="utf-8"
)

output_path = transcript_path.replace(".txt", "_summary.md")
with open(output_path, "w", encoding="utf-8") as f:
    f.write(result.stdout)

print(f"Summary saved to {output_path}")

Run it as:

python extract_actions.py transcripts/meeting_2026-06-12.txt

No Ollama? Paste the transcript directly into any chat AI with the same prompt. The output is identical — only the automation step differs.

Model Selection Guide

Model	VRAM	Speed (GPU)	Speed (CPU)	Best For
tiny.en	1 GB	Very fast	5 min/hr	Quick drafts, testing
small.en	2 GB	Fast	20 min/hr	CPU-only machines
medium.en	5 GB	Balanced	60 min/hr	Default recommendation
large-v3	10 GB	Slow	Not practical	Max accuracy, RTX 4070+

All models run entirely offline after the initial download.

Comparison: Local Whisper vs. Cloud Transcription Services

Feature	Whisper (local)	Otter.ai	Fireflies	Zoom AI Notes
Data leaves device	No	Yes	Yes	Yes
Cost per month	$0	$10–$20/user	$10–$19/user	Included with Zoom
Accuracy (English)	88–94% WER	~88%	~87%	~85%
Speaker diarization	With pyannote	Yes	Yes	Yes
Custom vocabulary	Via prompt	Paid	Paid	No
Offline capable	Yes	No	No	No
Setup time	30 min	5 min	5 min	0 min

Cloud services win on convenience and diarization out of the box. Local Whisper wins on privacy, cost at scale, and the ability to work without internet.

Adding Speaker Diarization

Whisper alone does not identify who said what. For meetings where attribution matters, combine it with pyannote.audio:

pip install pyannote.audio

from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="YOUR_HF_TOKEN"
)

diarization = pipeline("meeting_2026-06-12.wav")

for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{speaker}: {turn.start:.1f}s – {turn.end:.1f}s")

You can then align the diarization timestamps with the Whisper segment timestamps to produce speaker-labeled transcripts. The pyannote models run locally after download — a Hugging Face account is needed to accept the model license, but inference is fully offline.

Automating the Full Pipeline

Once the three steps work individually, chain them into a single script that runs after any meeting ends:

# record.bat — run during meeting
ffmpeg -f dshow -i audio="Speakers (Realtek High Definition Audio) (loopback)" ^
  -ar 16000 -ac 1 -c:a pcm_s16le ^
  "meetings\%DATE:~10,4%-%DATE:~4,2%-%DATE:~7,2%.wav"

# process.bat — run after meeting
set FILE=%1
python transcribe.py %FILE%
python extract_actions.py %FILE:.wav=.txt%
start "" "%FILE:.wav=_summary.md%"

Run process.bat meetings\2026-06-12.wav and the summary opens in your default Markdown editor automatically.

Privacy and Storage Considerations

Keep the following in mind when storing meeting transcripts:

Encrypt the WAV and transcript files if they contain sensitive business information. Windows BitLocker or VeraCrypt handle this at the folder level.
Set a retention policy — delete raw WAV files after transcription; keep only the summary unless you need verbatim quotes.
Shared drives: If you sync transcripts to OneDrive or SharePoint, check whether those systems apply OCR or AI indexing to uploaded documents.
Access control: Restrict transcript files to participants only. A shared \meetings\ folder on a network drive should not be open to the entire company.

Soft CTA

VoxBooster’s noise suppression ensures your microphone channel is clean before audio hits the low-latency audio capture loopback, which directly improves Whisper’s word-error rate on your voice. It runs locally on Windows 10/11, requires no kernel drivers, and integrates with any meeting app. A 3-day free trial is available — no credit card required.

After the trial: plans start at $6.99/month.

FAQ

Does Whisper transcribe in real time on a normal Windows PC? Not truly real time at full accuracy — Whisper is a batch model. On a mid-range GPU (RTX 3060) the small or medium model transcribes a 1-hour meeting in about 3-5 minutes after the call ends. For live captions consider Whisper Live or whisper-streaming forks, though they trade some accuracy for latency.

Is it legal to transcribe a Zoom or Teams meeting? Legality depends on jurisdiction and company policy. In most places you must inform all participants before recording or transcribing. Always announce at the meeting start that you are capturing audio for notes, and get explicit consent. This article is a technical guide, not legal advice.

What low-latency audio capture loopback device do I need to install? No driver installation is needed. low-latency audio capture loopback is a native Windows 10/11 API that mirrors any active output device — speakers or headphones — as a capture source. FFmpeg, Python sounddevice, and most audio libraries expose it directly. No virtual cable or third-party driver required.

Which Whisper model should I use for meeting transcription? The medium.en model is the best practical balance: 1.5 GB VRAM, ~90% word-error-rate reduction over tiny, and 4-6× faster than large on GPU. For CPU-only machines use small.en — it transcribes a 1-hour meeting in roughly 20 minutes on a modern CPU. Large-v3 only makes sense if you have an RTX 4070 or better.

Can I transcribe meetings without a GPU? Yes. Whisper runs on CPU via the openai-whisper package or the faster-whisper CTranslate2 backend, which cuts CPU inference time roughly in half. A meeting that would take 8 minutes on GPU takes about 20-25 minutes on a modern Intel or AMD CPU with small.en — acceptable for after-meeting batch processing.

How do I extract action items automatically from the transcript? The simplest method is a Python script that pipes the Whisper transcript into a local LLM prompt (Ollama + llama3 or Mistral) asking for a bulleted list of decisions and tasks. Alternatively, paste the raw transcript into any chat AI. VoxBooster’s noise suppression keeps the captured audio clean, which directly improves transcript accuracy.

Does this workflow work with Microsoft Teams recorded meetings? Yes, two ways: capture the live audio via low-latency audio capture loopback during the call, or download the Teams meeting recording from OneDrive and run Whisper on the MP4 file. The second path is simpler and lets you re-transcribe at any time without staying in the meeting.

Voice Meeting Notes with Whisper on Windows

TL;DR

Why Local Transcription Beats Cloud for Meetings

What You Need

Step 1 — Identify Your low-latency audio capture Loopback Device

Step 2 — Record the Meeting Audio

Step 3 — Install Whisper

Step 4 — Transcribe the Recording

Using openai-whisper (CLI)

Using faster-whisper (Python script)

Step 5 — Extract Action Items into Markdown

Model Selection Guide

Comparison: Local Whisper vs. Cloud Transcription Services

Adding Speaker Diarization

Automating the Full Pipeline

Privacy and Storage Considerations

Soft CTA

FAQ

Further Reading

Try VoxBooster — 3-day free trial.

TL;DR

Why Local Transcription Beats Cloud for Meetings

Legal and Consent — Read This First

What You Need

Step 1 — Identify Your low-latency audio capture Loopback Device

Step 2 — Record the Meeting Audio

Step 3 — Install Whisper

Step 4 — Transcribe the Recording

Using openai-whisper (CLI)

Using faster-whisper (Python script)

Step 5 — Extract Action Items into Markdown

Model Selection Guide

Comparison: Local Whisper vs. Cloud Transcription Services

Adding Speaker Diarization

Automating the Full Pipeline

Privacy and Storage Considerations

Soft CTA

FAQ

Further Reading

Try VoxBooster — 3-day free trial.