Skip to main content

Models, speed, and limits

Each lip sync tier has a different tradeoff between speed, quality, and project size.

Model differences at a glance

TierMain strengthWhat it improves mostBest fit
Lip SyncBest balance of speed and qualityNatural-looking lip sync for the majority of normal talking-head videosGeneral-purpose workflows
Lip Sync ProPremium facial detailBetter beard detail, teeth, mouth-region fidelity, and larger face handlingProfessional-quality close-ups and premium exports
Lip Sync StudioStrongest overall scene understandingBetter consistency, obstruction handling, angle robustness, and high-resolution outputDifficult shots, premium work, and the highest quality target

What each model is better at

Lip Sync

Lip Sync is the most practical default for many projects. It is best when you want:
  • a fast turnaround
  • good quality on normal talking-head footage
  • a strong quality-to-cost balance
For straightforward videos, this is often enough.

Lip Sync Pro

Lip Sync Pro focuses more on facial detail than standard lip sync. Its main advantages are:
  • stronger beard rendering
  • better teeth generation
  • cleaner mouth-region detail
  • better handling of larger face regions in frame
This is why Pro is often a better fit for:
  • tighter close-ups
  • higher-end creator content
  • premium marketing videos
  • projects where facial detail matters more than speed
The tradeoff is that it is slower and more expensive than standard Lip Sync.

Lip Sync Studio

Lip Sync Studio is the strongest model when the shot itself is difficult or when the output standard is especially high. Its biggest differences are:
  • it builds a fuller understanding of the whole shot instead of relying on smaller independent chunks
  • it handles obstructions natively instead of depending on extra settings
  • it is much stronger on profile views, over-the-shoulder framing, and non-frontal lip positions
  • it preserves more of the speaker’s cadence, emotion, and performance
  • it is built for higher-end output, including stronger 4K positioning
This is why Lip Sync Studio is better than Lip Sync Pro on:
  • hard camera angles
  • shots with partial face coverage
  • scenes where consistency matters across a longer uninterrupted take
  • premium short-form work where visual artifacts are less acceptable

Why some videos take longer than others

Two videos with the same duration can still process very differently. Common reasons:
  • one video has many scene cuts while another is mostly one stable shot
  • one video has multiple faces on screen
  • one video has more profile views or extreme angles
  • one video has more partial obstructions over the mouth or face
  • one video has fewer frames where the speaker is clearly visible and actively talking
  • one video is higher resolution
  • queue load is different at the time of processing
For Lip Sync and Lip Sync Pro, videos with many scene changes or many frames without clear talking faces can become much harder to process efficiently. That can increase runtime and make failures more likely on difficult long-form content. Lip Sync Studio is usually more resilient on these harder shots, but it is still the slowest tier because the model is doing more work per shot.

Why some clips fail or look weak

Quality can drop when the source contains:
  • still or nearly still faces that do not already look like they are speaking
  • many fast cuts
  • faces that are too small in frame
  • strong profile shots on lower tiers
  • heavy obstructions
  • non-human or non-humanoid faces
As a rule:
  • Lip Sync works well for the majority of standard videos
  • Lip Sync Pro improves premium facial detail
  • Lip Sync Studio is the best choice when the footage is difficult, high-value, or visually demanding

Typical performance

These timings are estimates, not guarantees. Actual runtime depends on queue load, video duration, resolution, and source complexity.
TierTypical time for ~30s videoTypical time for ~2 min videoNotes
Lip Syncabout 1 to 3 minutesabout 4 to 8 minutesBest balance for most projects
Lip Sync Proabout 2 to 5 minutesabout 6 to 12 minutesSlower, with stronger facial detail
Lip Sync Studioabout 10 to 15 minutescan approach 45 to 55 minutesHighest-quality tier, best saved for premium short-form work

Current duration limits

TierCurrent maximum video duration
Lip Sync30 minutes
Lip Sync Pro30 minutes
Lip Sync Studio5 minutes
Lip sync limits can change over time. If you are planning a large production workflow, confirm the current limit in the app before you start a long job.

What affects speed

  • video duration
  • resolution
  • current queue depth
  • face angle complexity
  • number of scene changes
  • number of visible faces
  • how often the speaking face is obstructed
  • whether the speaker is visibly talking throughout the shot
  • audio clarity
As a rule of thumb, 1080p is usually the best balance between speed and quality. 4K can work, but it increases processing time.

Current billing impact

These are the current extra billed minutes on top of the translated video minutes:
  • Lip Sync: +4
  • Lip Sync Pro: +9
  • Lip Sync Studio: +14