Voice Cloning Consent: Legal Checklist for Producers

Voice cloning consent is no longer a niche legal question for major studios — it is a checklist every producer, narrator director, game developer, and content creator needs before training an AI voice model on someone’s actual voice. The technology has become accessible to small teams and solo operators, but the legal obligations have not shrunk to match. This guide gives you a practical, producer-focused framework: what a proper consent agreement must contain, how SAG-AFTRA’s 2026 AI rider changes the landscape for union productions, how to structure data retention and deletion policies, and what revocation rights actually mean in practice.

This post is educational, not legal advice. Before executing any voice cloning agreement, have it reviewed by a lawyer with entertainment, IP, or technology law experience in your jurisdiction.

TL;DR

Written consent is mandatory — verbal agreements are not enforceable in most jurisdictions for AI voice replica creation.
A valid agreement must cover: scope of use, territorial limits, duration, compensation structure, data retention, deletion-on-request, and revocation procedure.
SAG-AFTRA’s 2026 AI rider adds session-level consent requirements and minimum compensation floors for union productions.
Data retention should be time-limited and tied to the license duration; deletion must be confirmed in writing.
Right of publicity laws, GDPR, and the NO FAKES Act (proposed US federal) treat unconsented voice cloning as a serious legal exposure.
Non-union productions are not bound by SAG-AFTRA rules but should use them as a baseline for best practice.

The legal landscape around AI voice replicas is changing faster than most production workflows. Three separate legal frameworks are converging simultaneously:

Right of publicity — a state-level US doctrine that gives individuals control over commercial use of their identity, including their voice. California, New York, Tennessee, and a growing list of states have updated right-of-publicity statutes to explicitly cover AI replicas. Tennessee’s ELVIS Act (2024) was among the first to specifically address AI-generated voice likenesses; several states have since enacted or proposed similar legislation.

Data protection law — voice recordings may qualify as biometric data under statutes like Illinois BIPA, and as personal data under GDPR (EU/UK). Training an AI model on voice samples arguably involves processing personal data, which triggers consent and data retention obligations independent of any IP or publicity considerations.

Emerging AI-specific legislation — the federal NO FAKES Act (proposed), state-level bills, and the EU AI Act’s provisions on deepfakes all create or are creating specific consent obligations for AI voice replicas used in commercial contexts.

The practical result: if you train an AI voice model on someone’s actual voice without written, informed, scope-limited consent, you face exposure on multiple simultaneous legal fronts. The cost of doing this wrong — even accidentally — is now high enough to justify a proper agreement on any project, regardless of budget.

For a broader discussion of the ethical dimensions beyond the legal minimum, see our post on voice cloning ethics in 2026.

A voice cloning consent agreement is not a standard model release form. It needs to address the specific characteristics of AI voice data, which can be used to generate unlimited derivative audio long after the original recording session. Below is a checklist of mandatory clauses with plain-English explanations of why each matters.

1. Identity and Capacity of Both Parties

Full legal name, address, and contact information of the voice talent (the “Licensor”) and the producing entity (the “Licensee”)
Confirmation that the talent is 18+ or has guardian co-signature
For union talent: union affiliation, membership number, and reference to the applicable collective bargaining agreement or AI rider

2. Description of the Voice Model

Specify exactly what is being created:

Recording session details (date, duration, content recorded)
Technical format of the output model (the agreement should describe the AI voice model generically — e.g., “a digital voice model trained on voice samples provided by Licensor” — without naming specific tools)
Whether the model is a cloned replica of the talent’s natural voice, a stylized variant, or trained on a character voice specifically performed for the project

This clause matters because it defines what the consent actually covers. A talent who agrees to a game character voice model has not consented to a commercial advertising voice model, even if technically trained on the same session audio.

3. Scope of Use Clause

This is the most negotiated clause in any voice AI agreement. It defines:

Parameter	Examples of Narrow Scope	Examples of Broad Scope
Content type	Single game title, internal training only	Any commercial product or service
Industry	Video games only	Any industry, including advertising
Media format	In-game dialogue only	Audio, video, interactive, broadcast
Platform	PC/console game on Steam	Any platform including broadcast TV
Attribution	Voice credited as “[Talent Name] AI Voice”	No attribution required

Commercial vs. non-commercial is a threshold distinction. Non-commercial scope (podcast, educational content, personal project) carries different compensation expectations and legal exposure than commercial scope (advertising, paid product, commercial game release). Be explicit — do not let “commercial” remain undefined.

4. Territorial Scope

Specify the geographic territories where the voice model may be deployed. Options range from a single country to “worldwide.” Territorial restrictions matter for:

Right-of-publicity laws, which vary by jurisdiction
Tax obligations on royalty payments
Compliance with data residency requirements (e.g., GDPR for EU deployment)

If the product will be distributed globally, a worldwide license is typically needed — but the compensation structure should reflect the scope.

5. Duration of the License

Define clearly:

Start date (typically the date of signing, not the recording session)
End date or perpetual grant — perpetual licenses require higher compensation to reflect indefinite use; time-limited licenses (1 year, 3 years, term of a game’s commercial life) are more common in fair agreements
Renewal terms — whether the Licensee may renew, at what cost, and whether the talent must affirmatively consent to each renewal

A perpetual license with no renewal mechanics is only appropriate with correspondingly higher compensation. For most productions, a defined term with renewal options is more defensible.

6. Compensation Structure

Three common structures, each appropriate for different contexts:

Flat fee — a single lump-sum payment for the license term. Simple, clean, appropriate for limited or non-commercial projects. The risk for talent: if the product becomes unexpectedly successful, the flat fee looks inadequate in hindsight.

Per-use residuals — payment triggered each time the AI voice is used commercially (per ad impression, per unit sold, per broadcast). Complex to administer but aligns compensation with actual value delivered.

Hybrid: flat session fee + royalty tier — an upfront payment for the recording session plus a royalty rate if the voice model generates revenue above a defined threshold. This is the structure most similar to SAG-AFTRA’s AI rider framework.

For any commercial use, the compensation should reflect:

The exclusivity of the license (exclusive costs more)
The territorial scope (worldwide costs more than single-country)
The duration (perpetual costs more than annual)
The industry and anticipated reach (advertising costs more than internal training video)

7. Data Retention and Audio Sample Storage Rules

This clause directly addresses how the raw voice recordings and trained model files are handled:

Retention period: how long the Licensee may retain the original audio samples and the trained model files (typically tied to the license duration plus a dispute-resolution buffer)
Storage location: on-premises or specific cloud providers; EU talent may require data to remain within EU borders
Access controls: who within the producing organization may access the voice data and model files
Third-party restrictions: whether the model may be shared with sub-licensees, vendors, or cloud services, and under what data processing agreements
Security obligations: minimum security standards for stored voice data

Audio sample storage is where productions frequently create accidental GDPR exposure. If raw session recordings are personal data (and they typically are under GDPR), they require a lawful basis for processing, a retention schedule, and deletion procedures — independent of the IP licensing aspects.

8. Deletion on Request

A data-rights clause, separate from the license duration:

The talent retains the right to request deletion of their original audio samples at any time after the license period ends (or, in some agreements, during the license period with appropriate notice)
The Licensee must confirm deletion in writing within a defined timeframe (typically 30-60 days)
Deletion of source audio does not automatically require deletion of the trained model if the agreement explicitly addresses this — but consider: a model trained on one person’s voice may carry biometric characteristics that themselves qualify as personal data under GDPR

If your production serves EU users or employs EU talent, consult a GDPR-specialist lawyer on whether the trained model itself constitutes personal data independent of the source recordings.

9. Revocation Rights

Revocation is the talent’s ability to withdraw consent. The agreement must specify:

Whether revocation is possible at all (perpetual irrevocable licenses exist but require higher compensation)
Notice period (typically 30-90 days for non-exclusive licenses)
Effect on existing uses — voice uses already “in the wild” (published games, live advertisements) generally cannot be retroactively removed; revocation prevents new uses, not existing ones
Effect on the model itself — does revocation require the Licensee to delete the trained model?

The cleaner approach for both parties: specify that revocation applies to new uses only, with a transition period for phasing out current uses (e.g., 6 months for advertising campaigns). This makes revocation practically manageable without giving the Licensee perpetual immunity.

SAG-AFTRA 2026 AI Rider: What Producers Need to Know

The SAG-AFTRA 2026 AI rider (formally part of the AI provisions negotiated as part of the Interactive Media Agreement renegotiation) represents the current industry standard for union productions. Key provisions:

Session-level consent is required. The talent’s agreement to perform a role does not constitute consent to AI replication. Consent for AI replica creation must be:

Obtained in a separate, standalone document
Obtained before the recording session, not after
Session-specific — consent for session A does not cover session B

Consent is non-transferable. If a Licensee sells or licenses the voice model to a third party, the original consent does not automatically transfer. The third party must obtain a new consent agreement (or the original agreement must explicitly authorize transfer).

Minimum compensation floors. The rider establishes minimum compensation for AI replica use beyond the original session scope. The specific figures are subject to collective bargaining updates, but the structure is: a base session fee for the creation session, plus a deployment fee each time the replica is used in a commercial context meaningfully different from the original contracted use.

Union notification before deployment. Producers must notify SAG-AFTRA before deploying a digital replica in a new commercial context. This is not an approval process — it is a notification that allows the union to verify compliance and flag concerns.

Non-union productions are not directly bound by this rider. However, the SAG-AFTRA framework represents the consensus view of what responsible AI consent looks like in the entertainment industry. Using it as a template for non-union agreements reduces legal exposure and demonstrates good-faith compliance with emerging norms — which matters if legislation later sets a retroactive standard.

For more on legal boundaries specifically around impersonation use cases, see our post on voice changer impersonation laws.

Checklist: Pre-Production Compliance Walkthrough

Use this before beginning any voice cloning recording session:

Legal foundation

Written consent agreement drafted and reviewed by counsel
Scope of use clause explicitly defines commercial/non-commercial, industry, media, platform, territory
Duration defined with clear start/end dates or renewal mechanism
Compensation structure documented with payment schedule
Revocation rights and procedures specified

Data and technical

Storage location for audio samples and model files specified in agreement
Retention period tied to license duration plus 90-day buffer
Third-party access and sub-licensing restrictions defined
Deletion-on-request procedure documented with 30-60 day response commitment
Written deletion confirmation process established

Union / industry compliance

If union talent: SAG-AFTRA AI rider attached as exhibit and signed separately
If union talent: session-level consent obtained before recording begins
If union talent: union notification procedure identified for deployment

Session documentation

Signed consent agreement on file before session begins
Session recording log maintained (date, content recorded, format, file names)
Chain of custody for audio files documented

Ongoing obligations

Calendar reminders set for license expiration / renewal decision
Designated point of contact for talent deletion requests
Process for notifying talent of any new commercial use outside original scope

Flat Fee vs. Residuals: Structuring Compensation Fairly

The compensation structure is the part of voice AI agreements that generates the most disputes after the fact. Here is a practical framework for thinking about it:

Project Type	Recommended Structure	Rationale
Internal corporate training video	Flat fee	Limited reach, no revenue generation
Indie game (non-commercial scope)	Flat fee	Predictable, bounded use
Commercial game title	Flat fee + royalty tier	Aligns upside if game succeeds
Commercial advertising campaign	Per-use residuals or high flat fee	High commercial value, broad reach
Perpetual commercial license	High flat fee or ongoing royalties	Indefinite use requires indefinite compensation
Podcast / YouTube (non-monetized)	Flat fee or nominal	Low commercial value
Podcast / YouTube (monetized)	Flat fee + revenue share	Aligns with platform monetization

The general principle: scope drives price. An agreement that grants worldwide, perpetual, all-commercial-use rights to a voice model for a single flat session fee is almost never fair to the talent — and courts in right-of-publicity cases have found broad-scope/low-compensation agreements to be factors supporting the talent’s claim of inadequate consent.

If budget constraints require a low flat fee, narrow the scope to match. A narrow license with fair compensation is enforceable; a broad license with nominal compensation invites post-production disputes that cost more than fair compensation would have.

Data Retention in Practice: A Timeline Example

Here is a concrete example of how a data retention and deletion schedule might work for a 2-year commercial game voice license:

Day 0: Signing
  → Audio recording session conducted
  → Voice model training begins on licensed recordings

Days 0–730 (License period):
  → Licensee may retain source audio + trained model
  → Talent may request access log at any time
  → Voice model may be used per agreed scope

Day 730: License expiration
  → New uses cease unless renewal signed
  → Retention window begins: 90 days for dispute resolution

Day 820: End of retention window
  → Source audio permanently deleted
  → Model files deleted (or, if agreement permits model retention with no new use, documented as inactive and restricted)
  → Talent receives written confirmation of deletion within 30 days

This schedule provides a clear, auditable record — which matters both for GDPR compliance and for demonstrating good faith if the talent later disputes the handling of their data.

For related technical context on how AI-generated voices can be detected and traced, see our post on voice cloning deepfake detection.

Common Mistakes Productions Make (and How to Avoid Them)

Using a standard model release or photo release. Model releases address image rights, not AI voice replica rights. They almost never cover the scope of use, data retention, or revocation rights that voice cloning requires. A general model release does not protect you for voice AI.

Getting consent after the session. Retroactive consent is weaker than prior consent in almost every legal framework. Obtain signed consent before the microphone goes live.

Failing to specify what “commercial use” means. If the agreement says “commercial use permitted” without defining it, every party reads that phrase differently. Specify the industry, the product, and the media format. Leave nothing to interpretation.

Omitting deletion-on-request provisions. Even if GDPR does not technically apply to your production, omitting a deletion-on-request clause creates avoidable friction if the talent’s circumstances change (e.g., they become a public figure and no longer want AI voice content associated with them in circulation).

Treating session fee as all-in compensation for perpetual license. A single session rate that would be appropriate for a one-year game license is not appropriate for a perpetual global advertising license. Scope mismatch in compensation is the most common source of post-production disputes in voice AI.

Internal Links for Further Reading

Consent is one piece of a broader legal and ethical framework around AI voice work. For related topics:

Voice cloning ethics in 2026 — the ethical dimensions beyond the legal minimum, including power dynamics between producers and talent
Voice changer impersonation laws — what happens legally when AI voice tools are used to impersonate real people
Voice cloning deepfake detection — how detection technology works and why it matters for consent enforcement
AI voice generator and celebrity ethics — the specific rules and pitfalls around public figures and celebrity voice likenesses
Voice cloning for voiceover production — practical guide to using AI voice cloning in professional voiceover workflows

Frequently Asked Questions

Yes, in virtually every jurisdiction. Verbal agreements are difficult to enforce and leave both parties exposed. A signed written consent document that specifies the scope of use, duration, territorial rights, and compensation is the minimum standard for any professional AI voice cloning project. Some US states have enacted specific laws requiring explicit written consent for AI voice replicas.

At minimum: full name and contact details of the voice talent, description of the voice model being created, permitted uses (commercial vs. non-commercial), territorial scope, duration of the license, compensation structure (flat fee, royalties, or residuals), data retention and deletion policy, revocation procedure, and a statement that the agreement does not transfer ownership of the underlying voice. A lawyer familiar with entertainment or IP law should review it.

Revocation rights depend on how the contract is written. A well-drafted agreement should specify the conditions under which revocation is possible — typically with 30-90 days written notice for non-exclusive licenses. Once voice data has been used in a published product, revocation of the underlying data does not automatically remove all derivative works; the contract must address this explicitly.

What does the SAG-AFTRA 2026 AI rider cover for voice cloning?

The SAG-AFTRA 2026 AI rider requires explicit session-level consent for any AI replica creation, separate from the general performance contract. It establishes that consent is non-transferable, sets minimum compensation floors for AI replica use beyond the original session scope, and requires producers to notify the union before deploying a digital replica in new commercial contexts. Non-union productions are not bound by it but may use it as a best-practice template.

How long should I retain voice training data after a project ends?

Best practice is to retain only as long as necessary for the licensed use, then delete on a defined schedule. If the contract grants a 2-year license, data should be retained for that period plus a reasonable dispute-resolution window (typically 90 days), then permanently deleted. The talent should receive written confirmation of deletion. GDPR (EU) and similar frameworks may impose their own retention limits if personal data is involved.

What is a scope of use clause in a voice cloning agreement?

A scope of use clause defines exactly what the cloned voice can be used for — e.g., commercial advertising in North America only, a specific game title, internal corporate training videos. It prevents the producer from repurposing the voice model for projects not covered by the original agreement. A narrow scope protects the talent; a broad scope protects the producer’s flexibility. Negotiating this clause is where most voice AI agreements are won or lost.

In most jurisdictions, yes — or at minimum, it creates serious civil liability. The US right of publicity (codified at state level), EU GDPR (voice can be biometric/personal data), and emerging AI-specific statutes like the NO FAKES Act (proposed federal) and Tennessee’s ELVIS Act all treat unconsented voice cloning as a cognizable harm. Criminal liability is increasingly being added in several US states.

Conclusion

Voice cloning consent is not a box you check to avoid lawyers — it is a foundational element of working ethically with the people whose voices power AI voice systems. The checklist in this post covers the core elements of a valid agreement: written consent obtained prior to the session, scope of use narrowly defined and fairly compensated, SAG-AFTRA AI rider compliance for union productions, data retention tied to license duration, deletion-on-request as a standing right, and revocation procedures that protect both parties.

The technology side of voice cloning is solved — tools like VoxBooster make high-quality AI voice modeling accessible to productions of any size on standard hardware. The legal side requires the same level of attention. A proper consent agreement is not a bureaucratic obstacle; it is what makes the technology sustainable for everyone involved — talent and producers alike.

Reminder: this post is educational. Have your voice cloning agreements reviewed by qualified legal counsel before execution.