How I got started in generative AI art

If you follow my Instagram and Facebook accounts, you’d have seen my generative AI art pieces.

Many people have been asking about how I create my AI art, so I decide to share about my journey in AI art, and what I learnt along the way.

How it all started, and stopped

My dad is an oil painter, and I grew up exposed to art at a young age. I started drawing at two and I’ve always had an interest in creating art. I generated my first computational graphics artwork using Apophysis and Ultra Fractal back in 2004, before the current crop of generative AI tools came into existence.

I first dabbled in generative AI art in June 2022 when I tried out NovelAI. As someone who dabbles in creative writing and text role playing, I was intrigued by a text-generation platform that allowed AI-assisted storytelling. However, I thought it was just a novelty and was turned off by the subscription cost and lost interest in it.

Despite that, NovelAI was still on my radar and when it launched the text-to-image generation feature in October 2022, I gave it a try. To have a better idea of the development of AI image generation, I also experimented with OpenAI’s DALL-E 2 that had just gotten rid of its waitlist, the Midjourney beta that had just launched, and NightCafe that ran Stability AI’s Stable Diffusion.

Back then, I had no idea what prompt engineering was and the images I generated were so horrible that I didn’t save a copy of them. I wish I did so I could show them here. If you saw those outputs, you would understand why I concluded that AI image generation was not ready for the mainstream.

ChatGPT, Midjourney, Stable Diffusion

Then, OpenAI launched ChatGPT at the end of November 2022, and it blew up in December and took the world by storm in January.

ChatGPT

Being a tech geek working on content creation, it was inevitable that I jumped on the ChatGPT bandwagon early on. I won’t go into much details on ChatGPT since it’s a separate topic. In short, besides using ChatGPT to help generate content, I was using it to brainstorm ideas, structure strategies and plans, and even wrote a couple of WordPress plugins.

All these were done through giving the right instructions through prompt engineering, the art of structuring instructions to get the generative AI model to perform tasks as intended. It was frustrating initially, having to fight ChatGPT to get the desired outcome, but very rewarding once you get the hang of it.

It was like having an AI assistant you could rely on, when it didn’t hallucinate.

Midjourney

While ChatGPT was the most talked-about thing in December 2022, becoming the fastest-growing consumer software in history by gaining over 100 million users within a month, another software was also taking the creative industry by storm – Midjourney.

Visual artists and content creators were creating artwork with Midjourney. Images flooded all my social media feeds. People were gushing over what Midjourney is able to generate. On the other end of the spectrum, people were also protesting as loudly about the ethical issues, which I’ll briefly touch on in a bit.

I gave Midjourney another go, tapping on my prompt engineering abilities. This gave results that were a lot better than what I generated half a year ago.

Stable Diffusion

This rekindled my interest in generative AI art. I went around trying the different cloud platforms before I decided to give running a Stable Diffusion instance a go. First, I tried the Stable Diffusion macOS apps Draw Things and DiffusionBee, but I found them lacking in a lot of ways, especially after I studied what was possible with Stable Diffusion.

I managed to install Stable Diffusion Web UI on my M1 Max Macbook, and that started me down my generative AI art journey as I discovered tricks to constantly improve my image generation output. However, it isn’t optimised for the Mac and produced very slow generation. I get generation speeds of around 20 seconds per iteration for a simple 512x512 image using the Euler sampler. Compare this to around 5 iterations per second on my PC.

There’s the Stable Diffusion optimisation for Core ML on macOS to leverage on the Apple Neural Engine, but it doesn’t perform as well as a PC with a decent Nvidia GPU. It also required converting the Stable Diffusion models from PyTorch to Core ML. That is quite a pain especially when I have over a thousand models.

To speed up my generation workflow, I built a cheap PC to run Stable Diffusion Web UI. There are many of these on Taobao targeting the AIGC (AI Generative Content) market. AIGC is huge in China and continues to grow rapidly. I run the Web UI on my local network so I can work on it from a browser on my Macbook.

First and second pass compared.

There are many techniques that I learnt to help me improve my generative AI art. With my prior knowledge in photography, Photoshop image manipulation, art direction, and Python, I was able to grasp the nuances of Stable Diffusion quickly and find ways to hack my workflows.

I plan to share as I grow and develop a Learn In Public series. If you’re interested to follow my journey, sign up for my newsletter!

Many people have also asked about workshops and lessons. I hear you and watch this space to be the first to know if that happens.

AI Controversies

There are many concerns over generative AI images. The two major ones I come across most often are training of AI models without artists’ consent, and the potential for abuse and misuse.

I held off from diving into creating generative AI art because I wanted to learn more about the ethical issues. As always, I have strong opinions, weakly held. My views are constantly changing as I gain more knowledge to make a better judgement.

Abuse and misuse

The latter is the lowering of the bar to abuse by bad actors. This includes creation of graphic and sensitive content, and spreading misinformation through fake images.

This is not something new that generative AI introduced. Photo manipulation has been around since the early 19th century, and deepfakes have been around for a few years. Yet, little has been done to deal with such deception and hoax.

Generative AI makes it a lot easier to produce a convincing fake image. And you can batch produce images at scale. Something needs to be done to make viewers know that the image is AI-generated and not real. While I don’t think there is any way to stop criminals from creating harmful content, there should be at least some form of safeguard.

Plagiarism

The other major source of outrage from those who oppose AI image generation is the unauthorised use of artists’ work to train the AI models. As an artist myself, I stand against any attempt to infringe upon the copyright of a creator.

However, once I understood how the diffusion models work, I realised that the choice to plagiarise artwork lies in the hands of the person creating the image, not the AI model.

Can you choose to take an exact copy of a photo that another photographer took? You can, but it makes you look bad. Can you imitate your favourite photographer’s style in the process of finding your own style? You can, and many amateur photographers do that while we attempt to discover our own style. Likewise for painting.

Just because these occur doesn’t mean that we should have an outright ban or boycott of AI image generation. People take images with their smartphones. And then there are those who engage in illegal photography such as up-skirt and other nonconsensual images. Should they be allowed to take such photos? No. Do these warrant a ban on smartphone cameras? No, because it won’t solve the problem and a ban would deprive people of the ability to take legit photos.

Model training

If your concern is infringement of copyright, I’m sure you won’t create an image that replicates another artist’s style even if the model allows it. You can take this one step further by choosing to use models that are trained ethically, meaning they use a training set sourced from images that consent to being used for training AI or from images in the public domain.

Ethically-trained models

A good model creator documents how they trained their models. This includes how they source their training data or the models used to create merged models. By keeping this transparent, it allows others who iterate on these models to train or fine-tune new models to make an informed decision.

Of course, there will be those who choose to train their models using unethical or even illegal training data sources. I think that these will remain as prevalent as the piracy of software, films, and books.

Train your own style

Instead of viewing AI as a threat, I believe it is important to learn how to use it properly to empower yourself.

While the debate over the ethical issues of generative AI art continues, some artists have already jumped on the technical advantages of generative AI and started training models based on their own photography or art style. By doing so, they are then able to generate images with their signature style using AI and experiment with concepts.

My generative AI art

To see more of my AI art, check out the overview page and follow me on the various platforms.

Check out the Mai Shiranui series.