Google Muse - The Fastest AI Image Generator

Google has unveiled a new artificial intelligence model called Muse, which can create high-quality images from text input in a fraction of the time. It is also designed to accurately represent texts and concepts in images.

By Michael Bires

Jan 8, 2023, 12:09 midnight GMT+0

Since the start of 2021, AI research has changed a lot because of the invention of lots of deep learning-based text-to-image models like DALL-E-2, Stable Diffusion, Midjourney, and Crayion.

According to Google, Muse should be more efficient than existing diffusion models, because it’s trained on a masked modeling task in discrete token space that uses masked image tokens instead of pixel spaces: it takes the text embedding from a pre-trained large language model (LLC) and predicts randomly masked image tokens.

Using discrete tokens requires fewer iterations therefore Muse should be significantly faster and which leads to better image creation and understanding of visual ideas like objects, their positioning, number, etc. Muse can also do inpainting, outpainting, and mask-free editing without changing or reversing the model.

Model architecture

Muse uses a text encoder to create text embeddings and a VQ Tokenizer to create tokens for a base model. The base model is then used to guess masked image tokens at a lower resolution. Finally, the lower-resolution tokens and text tokens are sent to a super-res model to guess the masked tokens at a higher resolution.

Mask-free editing

Muse enables mask-free image editing of real input images without having to "invert" the generative process, making it faster than other zero-shot image editing techniques.

Source: Google Research

Zero-shot Inpainting/Outpainting

The new architecture provides a comprehensive range of image editing possibilities, with applications that can be readily implemented without the need for further fine-tuning or the inversion of the model. This means that objects within the image can be easily replaced or modified with a simple prompt, without any requirement for masking. In addition, the new architecture also allows for fast and efficient modification of images, as well as a reduction of the time needed to complete the editing process.

Source: Google Research

Faster Output

Muse performs exceptionally well compared to Stable Diffusion 1.4, Parti-3B, and Imagen (Google's in-house competitor) when it comes to the quality, variety, and text alignment of the generated images. In fact, the output of Muse is on par with all three of these image generators, making it a great choice for anyone (in the future) looking to create high-quality visuals.

Muse has demonstrated significantly faster generation times than other image AI systems. By providing an impressive 1.3 second generation time per image (512 x 512), it outperforms the fastest image AI system, Stable Diffusion 1.4, by a margin of 2.4 seconds. This impressive performance clearly demonstrates Muse's capability in the field of image AI, and is an indication of the possibilities that come with further development.

The team made things faster by using a small, discrete area and decoding at the same time. For understanding text, they used a pre-trained T5 language model for text-to-text tasks. The team noted that Muse looks at the whole text, not just certain words.

Google Muse is the fastest AI image generator, outperforming Stable Diffusion 1.4 in human evaluations. - Source: Google Research

You can find examples of images on the project website. Google and the researchers haven't said anything yet about releasing an image model to compete with OpenAI's DALL-E 2 or Midjourney.

Right now, only Google's Imagen is available, but it's just a beta version and only available in the US.

Scientists are researching AI systems for language and images. The Muse team warns that this research can lead to harm, like spreading bias or false information. So, they won't release the code or a demo now. They especially worry about using these AI models to create people, humans, and faces.

Jump To