How lip sync works

Lip sync takes a source video and a new audio track, then updates the speaker’s mouth movements so they better match the translated speech.

What changes and what stays the same

VoiceCheap focuses on the speaking face region. In practice:

Detect the face

VoiceCheap identifies the speaking face and tracks the relevant facial landmarks around the mouth and lower face.

Analyze the translated audio

The system listens to the target audio and computes the mouth shapes needed to match the speech.

Generate new lip movement

The speaking region is regenerated frame by frame to match the translated audio more closely.

Blend the result back into the video

The generated mouth region is composited back into the original frame so the result stays visually coherent.

Lip sync is not the first quality step. The usual order is:

If the translated voice or timing still sounds wrong, fix that first. Lip sync works best when the audio is already in good shape.