Skip to main content

Voice cloning

Voice cloning recreates the speakers from the original source so the dub stays closer to the original identity of the video.

How it works in VoiceCheap

When voice cloning is enabled, VoiceCheap isolates the speaker voice first, then generates translated speech that tries to preserve the character of the original performance. The quality depends heavily on:
  • source voice cleanliness
  • the selected voice isolation method
  • whether the source contains noise, music, or overlapping voices

Main controls

Stability

Stability controls how steady the generated voice stays from one generation to the next.
  • lower values allow a wider emotional range
  • very low values can sound rushed or inconsistent
  • very high values can sound flat or monotone

Similarity

Similarity controls how closely the generated voice follows the original voice print.
  • higher values can stay closer to the original speaker
  • if the source audio is noisy, very high similarity can also pull in unwanted artifacts

Speaker boost

Speaker boost pushes the model a bit harder toward the original speaker identity.
  • it is usually subtle
  • it increases compute and latency
  • it can be useful when you want the cloned result to stay closer to the source

Preview and regenerate

VoiceCheap lets you preview voice examples inside the customization flow. This is the best way to validate cloning quality before you commit to a full translated run. Good workflow:
  1. choose Studio or Realistic
  2. preview the speaker samples
  3. adjust stability, similarity, or boost
  4. regenerate previews if needed
  5. launch the translated version once the preview sounds right

When to avoid cloning

Voice library or custom voices can be better when:
  • the original audio is noisy
  • multiple speakers overlap often
  • the source has very little clean speech
  • you want one consistent narrator voice across many projects