Voice cloning best practices
Better cloning starts with better source material. Small improvements in the source audio can make a noticeable difference in the final voice.Keep the samples clean
Aim for speech that is:- clear
- dry
- low-noise
- free from heavy music or crowd sound
Use consistent speech
The model responds better when the sample material sounds like one coherent speaking style. Try to keep:- a similar recording environment
- a similar speaking tone
- a similar energy level
Match the performance you want
Cloned voices tend to inherit the style of the sample material.- calm samples produce calmer output
- expressive samples produce more expressive output
- rushed delivery can create rushed output
Use enough audio, but not too much
As a rule of thumb:- use at least about 1 minute of useful speech when possible
- 1 to 2 minutes of clear material is usually a strong range
- more than about 5 minutes gives little practical gain in most cases
Keep the volume balanced
Try to avoid samples that are:- too quiet
- clipped or distorted
- heavily normalized to the point of sounding unnatural
Prefer the right isolation mode
For most projects:- start with
Studio - switch to
Realisticonly when the environment character is part of the experience

