Models, speed, and limits
Each lip sync tier has a different tradeoff between speed, quality, and project size.
Model differences at a glance
| Tier | Main strength | What it improves most | Best fit |
|---|
| Lip Sync | Best balance of speed and quality | Natural-looking lip sync for the majority of normal talking-head videos | General-purpose workflows |
| Lip Sync Pro | Premium facial detail | Better beard detail, teeth, mouth-region fidelity, and larger face handling | Professional-quality close-ups and premium exports |
| Lip Sync Studio | Strongest overall scene understanding | Better consistency, obstruction handling, angle robustness, and high-resolution output | Difficult shots, premium work, and the highest quality target |
What each model is better at
Lip Sync
Lip Sync is the most practical default for many projects.
It is best when you want:
- a fast turnaround
- good quality on normal talking-head footage
- a strong quality-to-cost balance
For straightforward videos, this is often enough.
Lip Sync Pro
Lip Sync Pro focuses more on facial detail than standard lip sync.
Its main advantages are:
- stronger beard rendering
- better teeth generation
- cleaner mouth-region detail
- better handling of larger face regions in frame
This is why Pro is often a better fit for:
- tighter close-ups
- higher-end creator content
- premium marketing videos
- projects where facial detail matters more than speed
The tradeoff is that it is slower and more expensive than standard Lip Sync.
Lip Sync Studio
Lip Sync Studio is the strongest model when the shot itself is difficult or when the output standard is especially high.
Its biggest differences are:
- it builds a fuller understanding of the whole shot instead of relying on smaller independent chunks
- it handles obstructions natively instead of depending on extra settings
- it is much stronger on profile views, over-the-shoulder framing, and non-frontal lip positions
- it preserves more of the speaker’s cadence, emotion, and performance
- it is built for higher-end output, including stronger 4K positioning
This is why Lip Sync Studio is better than Lip Sync Pro on:
- hard camera angles
- shots with partial face coverage
- scenes where consistency matters across a longer uninterrupted take
- premium short-form work where visual artifacts are less acceptable
Why some videos take longer than others
Two videos with the same duration can still process very differently.
Common reasons:
- one video has many scene cuts while another is mostly one stable shot
- one video has multiple faces on screen
- one video has more profile views or extreme angles
- one video has more partial obstructions over the mouth or face
- one video has fewer frames where the speaker is clearly visible and actively talking
- one video is higher resolution
- queue load is different at the time of processing
For Lip Sync and Lip Sync Pro, videos with many scene changes or many frames without clear talking faces can become much harder to process efficiently. That can increase runtime and make failures more likely on difficult long-form content.
Lip Sync Studio is usually more resilient on these harder shots, but it is still the slowest tier because the model is doing more work per shot.
Why some clips fail or look weak
Quality can drop when the source contains:
- still or nearly still faces that do not already look like they are speaking
- many fast cuts
- faces that are too small in frame
- strong profile shots on lower tiers
- heavy obstructions
- non-human or non-humanoid faces
As a rule:
Lip Sync works well for the majority of standard videos
Lip Sync Pro improves premium facial detail
Lip Sync Studio is the best choice when the footage is difficult, high-value, or visually demanding
These timings are estimates, not guarantees. Actual runtime depends on queue load, video duration, resolution, and source complexity.
| Tier | Typical time for ~30s video | Typical time for ~2 min video | Notes |
|---|
| Lip Sync | about 1 to 3 minutes | about 4 to 8 minutes | Best balance for most projects |
| Lip Sync Pro | about 2 to 5 minutes | about 6 to 12 minutes | Slower, with stronger facial detail |
| Lip Sync Studio | about 10 to 15 minutes | can approach 45 to 55 minutes | Highest-quality tier, best saved for premium short-form work |
Current duration limits
| Tier | Current maximum video duration |
|---|
| Lip Sync | 30 minutes |
| Lip Sync Pro | 30 minutes |
| Lip Sync Studio | 5 minutes |
Lip sync limits can change over time. If you are planning a large production workflow, confirm the current limit in the app before you start a
long job.
What affects speed
- video duration
- resolution
- current queue depth
- face angle complexity
- number of scene changes
- number of visible faces
- how often the speaking face is obstructed
- whether the speaker is visibly talking throughout the shot
- audio clarity
As a rule of thumb, 1080p is usually the best balance between speed and quality. 4K can work, but it increases processing time.
Current billing impact
These are the current extra billed minutes on top of the translated video minutes:
- Lip Sync:
+4
- Lip Sync Pro:
+9
- Lip Sync Studio:
+14
Related pages