Remember when Prisma was the ultimate “AI” image editing app? Yes, we have certainly come a long way since then. With the rise of prompt-based AI image generators like DALL-E and Midjourney, creating art and deepfakes is pretty much accessible to everyone.
But there are limits, aren’t there? After the initial novelty of having Midjourney imagine various prompts and see what he throws, it all gets pretty boring. Or at least it did for me.
Look, I’m introverted, which means I don’t like going out very much. But you know what I like? Having pictures of me in places I would probably never go; heck, places I can’t go too.
Naturally, I wanted to ask AI tools to create images of Me in different situations and places. However, I also didn’t want to upload pictures of myself to random websites hoping that the results would be good. and that’s where I heard about Dreambooth.
Let the games begin…
Turns out some really smart people have brought things like Stable Diffusion to the masses. Additionally, others have collaborated with them and allowed anyone with a bit of patience to build their own stable streaming models and run them, completely online.
So even though I have a MacBook Air M1 which is by no means meant to be used as a training machine for a deep learning image generation model, I can run a Google Colab laptop and do it all on Google’s servers – for free!
All I really needed then was a few pictures of myself, and that was it.
Training my AI image generator
Training your own image generator is not difficult at all. There are a number of guides available online if you need help, and it’s basically very simple. Just open the Colab notebook, upload your photos and start training the model. This all happens pretty quickly.
Alright, let’s be fair, the text encoder training happens pretty quickly, in 5 minutes. However, training the UNet with the default settings takes quite a long time – almost 15-20 minutes. However, considering that we’re actually training an AI model to recognize and be able to draw my face, 20 minutes doesn’t seem too long.
During training there are many ways to customize how well you want to train your model, and what I understand from reading the experiences of many people online is that there is no real “one size fits all” strategy here. However, for basic use cases, the defaults seemed to work fine for most people, and I stuck with them as well. Partly because I didn’t really understand what what most things meant, and partly because I just couldn’t bother trying to train multiple models with different training parameters to see what produced the best results.
After all, I was just looking for a fun AI picture generator that can make me half-decent pictures.
I’m not an AI expert by any stretch of the imagination. However, I understand that training a stable broadcast model on a Google Colab laptop with 8 jpegs of myself cropped to 512×512 pixels isn’t really going to result in anything out of the ordinary.
How wrong I was.
On my first attempt to use the model I trained, I started with a simple prompt that said “akshay”. Here is the image that was generated.
Not great, is it? But that’s not so bad either, right?
But then I started playing around with some of the settings available in the UI. There are several sampling methods, there are sampling steps, CFG scale, scripts and much more. It’s time to get a little crazy experimenting with different prompts and layouts for the template.
Obviously, the results of these images aren’t perfect, and anyone who’s seen me can probably tell that these aren’t “my” images. However, they are quite close; and I didn’t even train the model with any special care.
If I were to follow the countless guides on Reddit and elsewhere on the internet that explain how to improve training and achieve better results with Dreambooth and Stable Diffusion, these pictures might have turned out to be even more realistic (and arguably scarier).
This AI Image Generator Is Terrifyingly Good
See, I’m all for improvements in AI technology. As a technology journalist, I’ve followed the ever-evolving and improving field of consumer AI over the past two years, and for the most part, I’m deeply impressed and optimistic.
However, seeing something like Dreambooth in action makes me wonder about the unethical ways in which AI and ML-based tools are readily available to virtually anyone with access to a computer and the internet.
There is no doubt that there are a lot of bad actors in the world. While there are certainly innocent use cases for such readily available technology, if there’s one thing I’ve learned in my years of reporting on technology, it’s that putting a product in the hands of millions of people will undoubtedly lead to many undesirable results. . At best something unexpected, and at worst something downright disgusting.
Have the ability to create deepfake images of virtually anybody as long as you can get 5-10 pics of their face, it is extremely dangerous if used incorrectly. Think misinformation, misrepresentation, and even revenge porn — deepfakes can be used in all of these problematic ways.
Safeguards? What guarantees?
It’s not just Dreambooth either. On their own, and used well, Dreambooth and Stable Diffusion are incredible tools that allow us to experiment with what AI can do. But there are no real guarantees to this technology from what I have experienced so far. Of course, this won’t allow you to generate outright nudity in the images; at least by default. However, there are plenty of extensions out there that will also allow you to bypass this filter and create just about anything you can imagine, based on who each one is.
Even without such extensions, you can easily get tools like this to create a wide range of images of potentially disturbing and disreputable people.
Also, with a decently powerful PC, one can train their own AI models without any guarantees and based on the training data they want to use – meaning the trained model will create images that can be overwhelming and harmful beyond imagination.
Deepfakes are nothing new. In fact, there is a vast wealth of deepfake videos and media online. However, until the recent past, the creation of deepfakes was limited to a relatively small (although still significant) number of people who existed at the intersection of “people with good hardware” and “knowledge- do technical”.
Now, with access to free (limited use) GPU compute units on Google Colab and the availability of tools like fast-dreambooth that allow you to train and use AI models on Google, this number of people will increase exponentially. It probably already does – it scares me, and it should scare you too.
What can we do?
This is the question we should ask ourselves at this point. Tools like DALL-E, Midjourney, and yes, Dreambooth and Stable Diffusion, are certainly impressive when used with common human decency. AI is improving by leaps and bounds – you can probably tell by watching the explosion of AI-related news over the past two months.
So this is a crucial point where we need to find ways to ensure that AI is used ethically. How can we do this is a question I’m not sure I have the answer to, but I know that after using the Dream Fast AI Image Generator, and seeing its capabilities, I am afraid of its quality, without even trying too hard.