AI and Creativity: A Game Designer's Perspective

AI and creativity is a hot topic. I like to think that I’m on the bleeding edge of it, since I’m both a professional creative type (at least for now, please buy my stuff, thanks in advance) and someone who’s been using AI in the production process.

One thing about game designers is that we have a different way of looking at AI. Even when you don’t have a programming background (I have a little experience, but not enough to do it professionally), you wind up thinking and working with the same sorts of systems that go into AI. There’s a major difference between what we do and how AI works, but the underlying logic and procedure is not so different as it may seem.

After all, the right way to think about AI, at least the adversarial network systems that are common for single-application AI, is that they are basically playing a game. One piece of code works as a referee, checking to see if everything’s following the rules. The other works as a player, trying to follow the rules while accomplishing its goals.

These are “trained” until the players that are best at satisfying the referee remain. Some mutation keeps the individual pieces of code from being too similar.

There are alternate methods of working with AI, but they are similar. You feed a piece of code input data, and it attempts to alter, replicate, or manipulate this data to match specifications. Additional input and output chains can alter the process, but the fundamentals stay constant.

AI and Creativity: Can an AI be Creative?

So the first question here: is it possible for AI to be creative?

Answer: it depends on what your definition of creative is.

AIs are not people, and this shows in how they work.

Our current AI tries to complete a task based on the instructions provided.

Can it create new things? Yes. Between the input prompt and the desired outcome, it will make new things.

AI doesn’t always make new things. If the AI trained on a small data-set, its results may be over-sampled. This means that they duplicate portions of the input. Likewise, it will sometimes make new things where repetition is desirable, such as if asked to create a picture of a known but not overly famous person. Sometimes this is a consequence of ambiguity (multiple people with the same name), and sometimes it’s a consequence of machine perception.

Does the AI know what it’s doing? No. AI cannot think conceptually like we do. Similarities are based on training data and methodology. The AI is not conceptualizing blue when you ask for a blue image. The AI may be able to understand syntax for text, but it does not understand the contents. It knows that certain words appear together, and it can use this to create convincing blurbs.

Can AI compete with people? It depends.

One place where AI excel are at things that can be semi-automated.

For instance, a person might design an aerodynamic surface to run through a simulator. This is expensive, but also fairly fool-proof. They have to know what they’re doing, since these simulations are very computing intensive. You can’t just run every possible combination.

However, the laws of aerodynamics are not totally random. You can’t run truly random stuff through the simulator, but you can train an AI on effective aerodynamic surfaces. Then, when you need a surface for certain dimensions, you run the AI to get a few options.

Those options run through a simulator, et voila! You’ve got aerodynamic surfaces ready to go.

The Drawbacks of AI

The problem is that this works really well when there’s a clear right and wrong outcome. It doesn’t work so well on things that have ambiguity.

Another downside is that AI doesn’t have human expectations. If I ask it to draw me a person, it’s not thinking about people how I think about people. For instance, it may know the right number of features to add. However, it’s thinking of them without the iconography and meaning I associate with them.

From a creative perspective, we often think of AI output as “warped” or “uncanny valley” for this reason. When I use it for character portraits, for instance, it produces oddities. Characters’ eyes don’t match, and not only in color but also in shape and style. The nose might be drawn simultaneously in two “orientations.”

Since the AI’s concept for nose, such as it is, includes datasets with noses in different orientations (to say nothing of stylized drawings versus photography), this is to be expected. It’s simply trying to put together pieces to make a face, and it doesn’t really know better.

Think of the AI like a drilling machine, not a miner. The miner understands that they’re clearing out a tunnel to reach ore. The drilling machine just “knows” that the lever directing it is pushed in a certain directions, which sets it in motion.

Since I am primarily worried about text and image output, and images in particular, we’ll focus on this.

AI and Aesthetics

However, it’s worth noting that one critique leveled at AI is unfounded.

AI does not always produce output that lacks spirit, however one wishes to define that. AI learns from what it’s looking at, and a lot of initial datasets were pretty plain.

Remember that machine learning has a lot of applications. The CAPTCHA-style databases, which have photographs of things found in an environment, have to consider this. They are both used to feed machine learning for object recognition and to prevent spam by blocking machines.

Feed these datasets to an AI, and you’ll get outputs that broadly match the photographs.

You can get different results with different AI. I’ve messed around with two relatively sophisticated image-generating AI, both of which take text prompts and make images.

Midjourney is what I use most. It’s explicitly an art-focused AI, and generates aesthetically pleasing images with occasional issues with subjects. It’s also Discord-based, but can run in DMs or private servers so that you can have a catalogue of content. I have my own server with several channels that I use for particular types of image or my larger projects to keep things neatly categorized, though they also have a web interface. Midjourney will likely be server-only, because of its VRAM requirements and its business model.

Stable Diffusion is very similar to DALL-E, at least at first glance (there are significant differences behind the scenes). It produces fairly realistic output, but it’s not always aesthetically pleasing. Right now the only way to use it is through a public Discord server, which is too high-traffic for my tastes since I often need to go back to past renders, but they plan to release a version that runs on consumer GPUs at some point.

Both of these AI are under active development. I have some posts about Midjourney for portraits, which gives a lot of detail about how it works.

AI and Creativity: Use in Production

What is an AI good at?

It’s ideal for things that either have easily replicated results or don’t occupy a lot of phenomenological space for observers. If I ask an image-generating AI to give me a picture of a ball, it might still have some issues, but I should probably be able to expect something that pretty darn well matches the prompt.

Behold, a “red rubber kickball!”

AI and Creativity: A red rubber kickball, as prompted from Midjourney. — *Image generated by Midjourney.*

Of course, what I thought of when I gave the prompt was the textured kickball I had as a kid, but this works fine. It’s red, rubber, and a kickball.

But I can’t really see the imperfections here. It’s got a texture other than what I expected, but it appears to have gone for a stylized drawing and I did ask for a red rubber kickball.

A face, however, will have components I recognize. Landscapes have recognizable shapes, but most of us aren’t that familiar with them.

These images were generated with Midjourney for use in one of our games. They’re from an older learning model, and probably won’t see use. There are some obvious perspective and subject errors, but after upscaling we’d crop these images down and the reframing would allow us to choose better sections for display.

Low-Risk Uses

One place where Midjourney shines in particular, but where I suspect AI will become common, is in low-risk use cases.

These are things that can:

Pass without scrutiny.
Scrutiny reveals nothing to average audiences.

For instance, when Midjourney gives me a forest that fades into fog, it only needs tree-shapes. It doesn’t need perfect trees. I don’t need to be able to identify the species. In fact, if Midjourney creates a forest with incompatible types of trees, it is likely doing so in the same way an artist would.

A city-scape, for instance, works fine so long as there aren’t any obvious image issues. Artifacts and suddenly appearing floating building sections would be a problem, but a well-trained AI doesn’t do this.

Generally you still have to do some minor manipulations or adjustments, or fetch several images from the AI, before you can use something directly. But for minor illustrations, it works fine.

An example I’d use is to say that AI is something like a tween animator. You’re not using it for your main content, you’re using it for a splash of color.

One place where this shines is in images alongside text. The landscapes above are going to be the background for banner headings on a store page. The focus of each banner heading is its text, not its image, so the splash of color and hinting at subjects is all that matters.

For a text-generating AI, you might use it to turn raw values (like the dimensions of an item being sold) into marketing copy.

Early Work-Flow

An AI-generated output can serve as the basis for a human professional to do some work. For instance, I’m not a digital painter, but I have a graphic design background and can do basic photo-editing.

Touching up a face to remove blemishes or cutting an object out from a background is easy for me, so I can use AI to generate things and then do the final steps myself to create a better image. I’ve done done this with Midjourney to generate character portraits for games, since most AI-generated portraits come out obviously flawed.

When you have AI doing stuff like this, you’re treating it like an apprentice. It’s preparing the basic canvas, and a more skilled person comes along to clean up its messes.

This is useful for creative professionals because it’s often hard to start from nothing. Having some pre-existing elements gives the initial spark, and fixing issues is often less work than starting from scratch.

Of course, this is equally true for text. Most AI-generated text looks good, but has substance issues after raching certain lengths. Pass it off to someone to polish, and it’s easily fixed. However, you might also see it used as an editing tool, where it looks at text and tries to condense it, then hands it off to a human editor.

Specialized Applications

Of course, one solution to the issues with AI is to hyper-specialize it.

Thispersondoesnotexist.com offers a random image generator that creates a photographic facial portrait. It has at least a 30-40% success rate at generating a perfectly believable image with few of the issues Midjourney or Stable Diffusion have with faces.

But there’s a catch. It only makes faces. There are lots of other “This X Does Not Exist” generators out there, but they all have the same limitation: they are strictly single purpose, and single purpose to the point where they can easily train on a single data-set.

However, you can couple specialized applications with broader-application AI. GFPGAN is an example of photo-restoration technology that focuses on human faces. Feed it Midjourney output, and it’ll give you a much more believable face than Midjourney alone provides.

A portrait of a young woman, captioned "Character Creation," and example of AI and creativity when two different AIs are used to generate a more believable image. — This spoiler for an upcoming game features a Midjourney-generated portrait, with eyes fixed by GFPGAN. I’ve blended the two images together to avoid the overly smooth output that GFPGAN provides, and there are some other fixes to faces that I left in from GFPGAN’s output. In the manuscript production process, the bottom portion of the image, which has some noticeable issues, will be cropped out.

It’s worth noting that there are two things at play here. Midjourney makes an image with a user-defined subject. GFPGAN restores a face in an image with damage.

For a landscape, you’d need a landscape-specific tool. I’ve used Luminar AI with landscapes from Midjourney to replace a sky, which is a specialized feature it provides.

AI and Creativity: Commercial Applications

The question that the AI and creativity industries need to answer is: will a robot take my job?

Answer: Maybe.

I think that it is more likely that we will see AI increase the low-end standards. I use Midjourney to generate background images for marketing materials, for instance. This is something that I would previously have done myself. It replaces stock-art in a lot of my workflow, but stock-art wasn’t really going to be the main focus.

Another question here is limitation-sensitivity. AI can’t do things professional artists can yet. For instance, Midjourney can’t draw hands well. It can’t draw characters holding things reliably. Midjourney can’t draw whole scenes with characters in focus, though it can draw rough outlines of crowds believably in some contexts.

It also can’t perfectly match visual concepts. It takes directions, but not always well. If I were working on a franchise with distinctive characters, I’d have a hard time getting reliable results using any AI I’m aware of.

Of course, a clever or skilled enough person can work around this, but that requires time, equipment, and skills. You’re not replacing a professional.

If you’re an artist, you should probably expect to see certain jobs taken by AI. When I just need a random image, Midjourney can be perfect. If I need a character to show up in more than one place, I need to hire someone.

Intellectual Property

Another problem comes with intellectual property. This is a sticky point. AI and creativity as a field will pivot around what the legal boundaries of IP wind up being.

Right now the general consensus is that AI probably can’t hold copyright. Stable Diffusion treats its output right now as being public domain content. Midjourney considers it the property of the end useer (with a mandatory license to them, so they don’t have issues with the hosting and distribution of content).

These are approaches that probably work long-term. Most AI-derived content will be modified, giving it claims for protection due to the human element in its creation. AI creations that don’t have any after-market modifications aren’t particularly scarce, so I don’t think it makes sense to worry about them.

For me, I never distribute any projects that are just based on Midjourney output. The closest would be my The Paper Project which is a donationware collection. Some of the images in there are probably modified sufficiently to be copyrightable, others are as-is. In practice, however, most of my images are found in projects where they’ve been adjusted, cropped, and curated in a way that probably prevents their reuse without my authorization.

In practice, the important thing is that people like me don’t have to worry about AI in their workflow coming back and biting them, at least if the software licenses for it are good. The nuances of law will come up at a later date, and very few uses are going to be so mildly transformative that a sensible court wouldn’t accept them as granting ownership (yes, I know, I’m an IP nerd: I got the opportunity to study under Dennis Karjala for a semester in my undergrad as an enrichment opportunity).

Flow and Time

AI is not yet ready for dynamic end-consumer integration. I’m not going to use AI in a video game to make textures right now, both because procedural methods that are easier exist (e.g. perlin-noise derived textures) and because the outputs are still too unreliable.

From a commercial perspective, the closest analogy is stock-art. AI generates stock-art on demand; it’s not always custom-fitted to your needs, but when you just need something to fill whitespace or serve as a foundation, it works.

AI and creativity is a promising direction, but the current state is that AI is semi-creative: Midjourney in particular shows that it can create impressive visual works, but the end state is that you will need someone who knows what they’re doing to use it.

I have generated six thousand images with Midjourney, which is a little bit misleading because that counts variations and upscales of the same images. I’ve spent about 100 GPU hours over two months, which correlates to about 30-40 hours of prompting and checking outputs.

This is another place where stock-art is a good comparison, both in terms of price ($30/mo for Midjourney right now), time spent on the workflow, and volume of output. Midjourney often needs a few shots to get stuff right, but I can make as many attempts as I have time for.

Conclusion

Is AI and creativity a natural pairing?

I’m actually optimistic about the future. I don’t think AI will replace any professionals’ jobs, but what it will do is raise the quality of workflows.

It’s like the difference in quality we saw going from traditional media to digital media. It’s a lot easier to take more photos, or have safe workflows that allow easy recovery from mistakes, when working digitally than it is with film or canvas.

An AI might be able to create low-level content, but the lack of scarcity involved in this means that it won’t outcompete professionals. AI Dungeon is able to tell a story, but it doesn’t have the same feel as hanging out and throwing dice with friends. Midjourney can create a heckuva oil painting, but it still has blemishes and defects, and doesn’t depict desired subjects with any reliability.

This isn’t necessarily a flaw. One place where AI can shine is in showing untapped potential. It lets people explore edges and bring ideas to life, if only in a half-finished state.

And I think that’s an unmitigated good.

AI and Creativity: A Game Designer’s Perspective