DeepMind, company that belongs to Alphabet, holding company that controls or Google, presented his latest creation: I see 2an artificial intelligence model aimed at generating videos.
The novelty arrives to raise technology standards, especially at a time when the Sora, model of OpenAIstill presents clear limitations in this segment. In addition to Sora, DeepMind’s new feature joins other famous tools, such as Runway Gen 3 and Kling AI.
Veo 2 Features
Veo 2 impresses with its technical specifications. It is capable of generating video clips up to two minutes long at resolutions up to 4K DCI (4,096 x 2,160). This represents a significant advance: the resolution is four times greater and the duration six times greater than that offered by Sora, until then a reference in the market.
Limited initial access
Despite the promises, access to Veo 2 is still restricted. The model is currently available exclusively through Vertex AI in VideoFX, an experimental video creation tool from Google. To use it, interested parties must sign up for a waiting list, and the tool is not yet available in all countries.
At this initial stage, Veo 2 is limited to generating videos of just eight seconds, with a maximum resolution of 720p. In comparison, Sora still has an advantage in practical use, allowing you to create 1080p videos up to 20 seconds long
From text and images to videos
Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts.
We’re also releasing an improved version of our text-to-image model, Imagen 3 – available to use in ImageFX through… pic.twitter.com/h6ejHaMUM4
— Google DeepMind (@GoogleDeepMind) December 16, 2024
One of the standout features of Veo 2 is its flexibility in content creation. It can generate videos based on text prompts, but it also allows you to use a reference image combined with a textual description to create the desired result.
The big difference, however, is the model’s ability to “understand” physics and camera controls in a more advanced way. According to DeepMind, this guarantees the creation of videos with more realistic textures and movements, as well as greater control over camera angles, allowing you to capture objects or people from different perspectives.
Improved physics and promising realism
Google’s Veo 2 vs OpenAI Sora pic.twitter.com/AdNqwCCpGE
— Joseph Carlson (@joecarlsonshow) December 17, 2024
Veo 2 especially shines in scenes with complex elements, like fluids or light and shadow interactions. Demonstrative videos presented by Google show impressive fragments, where precision in details, such as reflections and particle movement, stand out.
This more advanced “understanding” of physics is one of the main factors that differentiate Veo 2 from other models, allowing for a level of realism previously difficult to achieve.
Fight against failures
Despite advances, DeepMind recognizes that there are still challenges to be overcome, especially with regard to visual consistency. Maintaining the characteristics of a character or object throughout the video is still a point of improvement. However, the company claims that Veo 2 already presents superior results to Sora in this aspect, especially in demonstrative tests.
Mystery surrounding training data
DeepMind did not make it clear about the database used to train Veo 2. Among the biggest clues, YouTube would be one of the main sources for the data.
Source: https://www.hardware.com.br/noticias/veo-2-ia-google-geracao-video.html