Google Unveils Imagen 2: A Tool for Creating Video Clips

Google has had some setbacks with its image-generating AI technology in the past.

Previously, Google faced issues with its image generator embedded in Gemini, an AI chatbot, resulting in offensive inaccuracies like racially diverse Nazis being created. Google took steps to address this and is now introducing a new and improved image-generating tool called Imagen 2 on its Vertex AI developer platform with a focus on enterprise applications. The announcement was made at the Cloud Next conference in Las Vegas.

Image Credits: Frederic Lardinois/TechCrunch

Imagen 2, launched in December as a family of models after a preview at Google’s I/O conference in May 2023, allows users to generate and edit images based on text prompts, similar to tools like DALL-E and Midjourney by OpenAI. Particularly aimed at businesses, Imagen 2 can handle text, logos, and emblems in multiple languages, overlaying them onto existing images such as business cards and apparel.

Following its preview release, image editing with Imagen 2 is now available in Vertex AI along with two new features: inpainting and outpainting. These functions, available in other image generators like DALL-E, allow users to remove unwanted image parts, add new elements, and expand the image borders for a broader perspective.

The highlight of the Imagen 2 update is the introduction of “text-to-live images.”

With this addition, Imagen 2 can now create brief four-second videos based on text inputs, competing with similar AI-powered tools like Runway, Pika, and Irreverent Labs. Targeted at marketers and creatives, live images can be used for creating GIFs for ads featuring themes like nature, food, and animals, which Imagen 2 excels at producing.

Google states that live images incorporate “various camera angles and motions” while ensuring consistency throughout the sequence. However, the current resolution is limited to 360 pixels by 640 pixels, with plans for enhancements in the future.

To address concerns of deepfake creation, Google utilizes SynthID, a technology developed by Google DeepMind, to add invisible cryptographic watermarks to live images. Google claims these watermarks are resistant to alterations like compression, filters, and color adjustments, but detection requires a proprietary tool not available to third parties.

Emphasizing safety, Google assures that live image generations are “filtered for safety” to prevent inappropriate content. Google’s spokesperson confirmed that Imagen 2 in Vertex AI does not encounter the same issues as previous AI applications, showing their commitment to testing and customer engagement.

Image Credits: Frederic Lardinois/TechCrunch

Despite Google’s efforts, live images face stiff competition from other video generation tools.

Compared to alternatives like Runway, Stability AI, and OpenAI’s Sora, which offer longer clips, higher resolutions, and enhanced realism, Google’s live images fall short. While Imagen Video and Phenaki, other Google video generation technologies, showcase promise, Imagen 2’s current offering appears less competitive.

Concerns also arise regarding the training data utilized by models like Imagen 2, sourced primarily from public web data including blogs, media transcripts, and forums. The lack of transparency about specific sources raises questions about data privacy and potential IP-related issues.

As live images progress towards general availability, Google faces challenges ensuring data privacy, content safety, and competitive edge in the evolving landscape of generative AI technology.