The Stable Diffusions open-source AI image generation model was launched to researchers on August 10th and released to the general public on August 22nd. In the last 30 days, we've seen a spike in innovation in the domain. Creators are generating unique images and short videos with unthinkable styles and realism. The results are truly amazing, but the hidden secret is that most images aren't made with one prompt and click. Artists are setting up complicated pipelines that involve multiple steps to get results. The process is time-consuming and consists of a level of expertise that most people don't have.
Freeway ML is lowering the barrier to generating high-quality assets; we've engineered a solution and UI that makes it easy for even the newest users to obtain good results.
What is Stable Diffusion?
Stable Diffusion is an open-source AI image generation model that creates images from natural language descriptions (text prompts)—also known as a text-to-image machine learning model. The model was developed by Stability AI, led by Pattrick Esser and Robin Rombach, and is based on their Latent Diffusion Model paper submitted in December 2021.
Latent Diffusion Paper: https://arxiv.org/abs/2112.10752
How is Stable Diffusion Different from Dall e 2?
Stable Diffusion's results are incredible, but critics still feel Dall E 2 results are better. Matthew Carlson from Hackday.io said:
"After playing with SD on our home desktop and fiddling around with a few of the repos, we can confidently say that SD isn't as good as Dall-E 2 when it comes to generating abstract concepts."
If Stable Diffusion's results are not as good, why is it growing faster than Dall E 2?
The explosive growth is attributed to accessibility. Stable Diffusion is open source, released under the Creative ML OpenRAIL-M license, making it permissive to use, giving the artist ownership rights to the AI generated image. Since it's open source and has APIs, engineers can embed the model into apps. We already see plug-ins for Photoshop, Gimp, Kitra, Figma, and Canva. On the other hand, Dall e 2 has content restrictions, copyright restrictions, and no API.
Lastly, it's possible to run Stable Diffusion on commodity hardware. The model would run on 10GB of VRAM in the first release. Recently, Diffusion Bee launched an open-source Electron app that runs Stable Diffusion on a 16GB MacBook with an M1 chip.
Chaining together ML to deliver Dall E 2 quality from Stable Diffusion
Freeway ML has engineered a pipeline and UI to reduce the time required to generate a usable asset.
We'd like to acknowledge the contributions of Stability.ai and others in the AI media generation cooperative. The team at Freeway ML is standing on the shoulders of giants. The hard-working researchers and contributors that released Stable Diffusion has enabled creativity for all.
With Stable Diffusion, users have incredible possibilities at their fingertips. Unfortunately, the process is more complex than typing a text prompt and generating the perfect image. Behind the scenes, creators are prompt engineering, reprocessing, in-painting images, adjusting steps, and many more advanced techniques.
Freeway ML aims to lower the friction associated with creating high-quality assets with an intuitive UI and robust APIs to support more extensive operations. Our key metric is time to a usable asset (TTUA). How quickly can we deliver an asset that can be used in a professional context or shared with peers? Let's dive into some details on how we reduce TTUA.
An prompt engine built on open source with more intelligence
An AI image generator is a lens through which the computer sees the world. Just like a human, it's easier to remember and visualize a text prompt such as cat," "dog," or "panda." But it gets more challenging as we add more intricate details; in many cases, the engine gets confused and returns an undesirable result.
Freeway uses a proprietary approach to solve this issue. While the details are part of our IP, the general concept is that we perform multiple operations to improve text-to-image accuracy. Naturally, the more abstract the concept, the harder it is for the system to produce an accurate result.
Simple Abstract Prompt + Style
More Complicated Transformation Prompt
Complicated Abstract Prompt
Dall E 2 has an intelligent system that generates quality test-to-image results. Open-source engines struggle to generate abstract concepts, but Freeway ML's behind-the-scenes engineering improves the results to compete with the quality of Dall E 2 while also giving users copyright access to the generated image.