Stable Diffusion 3 “Most Sophisticated Model Yet”

Stable Diffusion 3 "Most Sophisticated Model Yet"

Today, Stability AI introduced Stable Diffusion 3 Medium, described as the British startup’s “most advanced text-to-image open model yet.” The new model, boasting 2 billion parameters, is designed to deliver photorealistic images and is optimized to run efficiently on standard consumer GPUs, addressing common issues like artifacts in hands and faces.

The company highlighted that SD3 Medium is built to comprehend complex prompts that involve spatial relationships, compositional elements, actions, and styles. It also features enhanced typography capabilities, with Stability describing the text generation accuracy as “unprecedented,” thanks to the Diffusion Transformer architecture.

This image has an empty alt attribute; its file name is Stable-Diffusion-31-1024x193.png

One of the key features of SD3 Medium is its size. With 2 billion parameters, it is notably smaller than many other models in the Stable Diffusion 3 series, which can have anywhere from 800 million to 8 billion parameters. This smaller size contributes to a lower VRAM footprint, making the model “ideal” for running on consumer-level hardware without compromising performance. Stability AI also notes that the model can capture nuanced details from small datasets, which enhances its customization capabilities.

“Stability AI will continue to push the frontier of generative AI, and will aim to retain its lead at the forefront of image generation,” said Christian Laforte, co-CEO of Stability AI. He indicated that further enhancements to the model are expected, as the company plans continuous improvements.

SD3 Medium is accessible through Stability’s API, with model weights available under an open non-commercial license and a low-cost Creator License. For larger scale commercial uses, the company is offering licensing options upon request.

However, Stability AI launches this model amid challenging times. Founded in 2020 and quickly becoming a notable player in the generative AI landscape alongside competitors like Midjourney and OpenAI’s Dall-E, Stability AI was valued at $1 billion by investors in 2022. Since then, the company has faced a series of legal and financial hurdles, including lawsuits from artists claiming the company used their works without permission to train its AI models and reports of potential financial instability.

The company’s founder and former CEO, Emad Mostaque, resigned in March to pursue projects in decentralized AI, amidst these growing challenges. Despite these issues, the software continues to impress, with images from SD3 Medium showing marked improvements.

Looking ahead, Laforte revealed that upgrades are not limited to image generation. “Stability AI is focusing on multimodal efforts across video, audio, and language,” he added, signaling an expansive future for the company’s technology suite.