Stable Dreamfusion
By Aron Petau • 2 minutes read •
Stable Dreamfusion
Sources
I forked a popular implementation that reverse-engineered the Google Dreamfusion algorithm. This algorithm is closed-source and not publicly available. You can find my forked implementation on my GitHub repository. This version runs on Stable Diffusion as its base process, which means we can expect results that might not match Google's quality. The original DreamFusion paper and implementation provides more details about the technique.
Gradio
I forked the code to implement my own Gradio interface for the algorithm. Gradio is a great tool for quickly building interfaces for machine learning models. No coding is required for the end user - they can simply state their wish, and the system will generate a ready-to-be-rigged 3D model (OBJ file).
Mixamo
I used Mixamo to rig the model. It's a powerful tool for rigging and animating models, but its main strength is simplicity. As long as you have a model with a reasonable humanoid shape in a T-pose, you can rig it in seconds. That's exactly what I did here.
Unity
I used Unity to render the model for the Magic Leap 1 headset. This allowed me to create an interactive and immersive environment with the generated models.
The vision was to build an AI Chamber of Wishes: You put on the AR glasses, state your desires, and the algorithm presents you with an almost-real object in augmented reality.
Due to not having access to Google's proprietary source code and the limitations of our studio computers (which, while powerful, aren't quite optimized for machine learning), the results weren't as refined as I had hoped. Nevertheless, the results are fascinating, and I'm satisfied with the outcome. A single object generation in the environment takes approximately 20 minutes. The algorithm can be quite temperamental - it often struggles to generate coherent objects, but when it succeeds, the results are quite impressive.