This app generates short talking videos of around 10 seconds using LTX2’s synchronization system. The model is no longer in development and is generally stable, but it may still fail if the input conditions are unsuitable. Following the guidelines below greatly improves the success rate.
Audio up to 30 seconds is supported (due to Seaart’s limit), but videos longer than 20 seconds — especially when paired with background music — tend to produce more repetitive motion. For best results, please use images where the subject’s face is clearly visible and the overall quality is sharp. Providing clear instructions about the intended actions — and specifying timing with a structured cue format when necessary — also improves consistency.
LTX2 may occasionally add visual elements at the end of the video, so please include “unprocessed footage” or “clean version” at the end of your prompt to reduce this behavior.
1. Use a medium-shot image: from the waist up, with the face and shoulders clearly visible. If the image is too far away or shows the entire upper body from a distance, the failure rate increases dramatically. Think of a composition similar to a résumé photo.
2. Audio is currently limited to 10 seconds. If you want to try longer audio, check the LTX2 section in my workflow.
