Models, speed, and limits

Each lip sync tier has a different tradeoff between speed, quality, and project size.

Model differences at a glance

Tier	Main strength	What it improves most	Best fit
Lip Sync	Best balance of speed and quality	Natural-looking lip sync for the majority of normal talking-head videos	General-purpose workflows
Lip Sync Pro	Premium facial detail	Better beard detail, teeth, mouth-region fidelity, and larger face handling	Professional-quality close-ups and premium exports
Lip Sync Studio	Strongest overall scene understanding	Better consistency, obstruction handling, angle robustness, and high-resolution output	Difficult shots, premium work, and the highest quality target

What each model is better at

Lip Sync

Lip Sync is the most practical default for many projects. It is best when you want:

a fast turnaround
good quality on normal talking-head footage
a strong quality-to-cost balance

For straightforward videos, this is often enough.

Lip Sync Pro

Lip Sync Pro focuses more on facial detail than standard lip sync. Its main advantages are:

stronger beard rendering
better teeth generation
cleaner mouth-region detail
better handling of larger face regions in frame

This is why Pro is often a better fit for:

tighter close-ups
higher-end creator content
premium marketing videos
projects where facial detail matters more than speed

The tradeoff is that it is slower and more expensive than standard Lip Sync.

Lip Sync Studio

Lip Sync Studio is the strongest model when the shot itself is difficult or when the output standard is especially high. Its biggest differences are:

it builds a fuller understanding of the whole shot instead of relying on smaller independent chunks
it handles obstructions natively instead of depending on extra settings
it is much stronger on profile views, over-the-shoulder framing, and non-frontal lip positions
it preserves more of the speaker’s cadence, emotion, and performance
it is built for higher-end output, including stronger 4K positioning

This is why Lip Sync Studio is better than Lip Sync Pro on:

hard camera angles
shots with partial face coverage
scenes where consistency matters across a longer uninterrupted take
premium short-form work where visual artifacts are less acceptable

Why some videos take longer than others

Two videos with the same duration can still process very differently. Common reasons:

one video has many scene cuts while another is mostly one stable shot
one video has multiple faces on screen
one video has more profile views or extreme angles
one video has more partial obstructions over the mouth or face
one video has fewer frames where the speaker is clearly visible and actively talking
one video is higher resolution
queue load is different at the time of processing

For Lip Sync and Lip Sync Pro, videos with many scene changes or many frames without clear talking faces can become much harder to process efficiently. That can increase runtime and make failures more likely on difficult long-form content. Lip Sync Studio is usually more resilient on these harder shots, but it is still the slowest tier because the model is doing more work per shot.

Why some clips fail or look weak

Quality can drop when the source contains:

still or nearly still faces that do not already look like they are speaking
many fast cuts
faces that are too small in frame
strong profile shots on lower tiers
heavy obstructions
non-human or non-humanoid faces

As a rule:

Lip Sync works well for the majority of standard videos
Lip Sync Pro improves premium facial detail
Lip Sync Studio is the best choice when the footage is difficult, high-value, or visually demanding

Typical performance

These timings are estimates, not guarantees. Actual runtime depends on queue load, video duration, resolution, and source complexity.

Tier	Typical time for ~30s video	Typical time for ~2 min video	Notes
Lip Sync	about 1 to 3 minutes	about 4 to 8 minutes	Best balance for most projects
Lip Sync Pro	about 2 to 5 minutes	about 6 to 12 minutes	Slower, with stronger facial detail
Lip Sync Studio	about 10 to 15 minutes	can approach 45 to 55 minutes	Highest-quality tier, best saved for premium short-form work

Current duration limits

Tier	Current maximum video duration
Lip Sync	30 minutes
Lip Sync Pro	30 minutes
Lip Sync Studio	30 minutes

Lip sync limits can change over time. If you are planning a large production workflow, confirm the current limit in the app before you start a long job.

What affects speed

video duration
resolution
current queue depth
face angle complexity
number of scene changes
number of visible faces
how often the speaking face is obstructed
whether the speaker is visibly talking throughout the shot
audio clarity

As a rule of thumb, 1080p is usually the best balance between speed and quality. 4K can work, but it increases processing time.

Current billing impact

These are the current extra billed minutes on top of the translated video minutes:

Lip Sync: +4
Lip Sync Pro: +9
Lip Sync Studio: +14

Plan availability

Beginner: Lip Sync
Creator: Lip Sync and Lip Sync Pro
Scale: Lip Sync, Lip Sync Pro, and Lip Sync Studio

​Models, speed, and limits

​Model differences at a glance

​What each model is better at

​Lip Sync

​Lip Sync Pro

​Lip Sync Studio

​Why some videos take longer than others

​Why some clips fail or look weak

​Typical performance

​Current duration limits

​What affects speed

​Current billing impact

​Plan availability

​Related pages

Models, speed, and limits

Model differences at a glance

What each model is better at

Lip Sync

Lip Sync Pro

Lip Sync Studio

Why some videos take longer than others

Why some clips fail or look weak

Typical performance

Current duration limits

What affects speed

Current billing impact

Plan availability

Related pages