We all know them. Those little things the disrupt your day and make business harder than it needs to be. In our Creature Discomforts campaign, we visualize these little struggles using our animated creatures, and show how Lenovo Pro helps you overcome them.
In this 6-part series, we look at the overwhelming feeling that comes along with the emergence of a new business disruption trend, in this case: AI. The AI Revolution is coming, and the most successful businesses will be those who learn AI and understand how it can help them achieve unimagined levels of productivity and efficiency. This written series provides you an entry point into learning about the current AI landscape and shows how you can use robots to overcome those Creature Discomforts.
If you need images for your social media campaign or website but you can’t find what you want on stock image sites, then generative artificial intelligence can help. With a little practice, you can get an AI to produce all kinds of images for you.
If you have been following this series, then you will probably have experimented with large language models (LLMs) such as ChatGPT. As with LLMs, there are a variety of image-generating AI systems to choose from. The highest profile are DALL-E, Stable Diffusion and Midjourney. These are available not just through their own websites and apps but are often embedded in other services too.
DALL-E, for example, is the underlying AI model for Microsoft’s Image Creator tools. That isn’t surprising because Microsoft is the largest investor in OpenAI, which is the company behind DALL-E and ChatGPT.
Stable Diffusion, meanwhile, has been released as open-source software by its makers Stability AI, which means there are lots of tools available that use it. You can even download it and run it from your PC. Overall, the best way to get started is with Stability AI’s own tool, Dreamstudio. Finally, there’s Midjourney, which is currently only available through the chat service Discord. More on that later.
Aside from the ‘big three’, various other companies have added image generation tools. Google has ImageFX, which uses an innovative series of dropdown menus to help refine your prompt. Adobe, known for photo editing software Photoshop, has added an AI tool called Firefly. And stock image companies, Getty Images and Shutterstock, have added their own tools. Which one you end up using will depend a lot on personal preference.
Prompt-writing for image generators is like prompting LLMs: explain in natural language what you want to produce, then give it a moment to work on a solution. Creating an image takes longer than producing text, partly because LLMs share the start of their answer before they’ve written the end, whereas an image generator must complete the task before showing you anything.
LLMs also have multiple output types. They can answer questions, generate new writing, summarize documents, and so on. Depending on what you want to achieve, you might write your prompt in different ways. In a sense an image generator is a simpler proposition because it just produces images. You just have to provide a good description.
However, there are variables within that seemingly simple task. You should begin by describing the image content: the kind of scene, what it contains and what action - if any - is taking place. Then move on to how the image is portrayed, such as mood, lighting, framing, and so on. The mood could be “inspiring” or “ominous”, for example. Lighting could be specific (“sunlit”) or more general (“moody”). Finally, you can add specific details, for example, “a woman stands on the right of the frame, looking towards the camera”.
Image generators don’t like long prompts, though. AI gurus often recommend multi-paragraph ‘mega-prompts’ for LLMs, but those should be avoided here. There is no hard limit but Zapier suggests a 60-word maximum for Midjourney and 380 characters for Stable Diffusion. The longer the prompt, the more likely it is that the AI will ignore words at the end. This will probably improve but keep those limits in mind for now.
Finally, if you have used LLMs regularly, then you will have become used to the conversational interface. You can ask a question, get a response, ask for more detail, and so on. If you are creating text, then you can paste responses together to get the best combination. This doesn’t work for images because if you ask for changes, the AI will generate the entire image again. That means it might fix the bits you didn’t like but also scrap the ones you did.
Even if you specify an area that you don’t want to change, the AI might ignore you. In DALL-E, for example, you can shade an area and give the AI an instruction about it. I had it produce a picture of a sports car driving through a mountain range, then shaded the car and told the AI I wanted to keep the car and change the background. The AI changed the background as I asked but re-drew the car too.
This is a new technology, so it isn’t without problems. First, it does not understand what it is doing. Just as LLMs are trained on billions of documents without knowing what they mean, so image generators are trained on masses of images without understanding what is in the picture. That’s why they sometimes give people too many fingers or even limbs. They don’t understand objects either, so you might see a picture of a barber ‘cutting’ hair without scissors in their hand.
Words are a particular challenge. Try using an image generator to produce a book cover and you’ll probably get a jumble of repeated letters and vague shapes. This seems like it should be easy to solve, particularly because other types of AI are so good with words, but it’s not. Image generators don’t know they are generating words, any more than they understand hands or scissors. They are approximating details in their training data.
A more serious challenge is accuracy and bias. Google had to apologize after its Gemini AI produced pictures of Nazi soldiers that included black and Asian people. The mistake was easily spotted; most people are aware that these pictures are historically inaccurate and offensive, but not every mistake will be as clear. Businesses must ensure that there is a clear approval process for any AI-generated image they use.
A similar problem comes from intellectual property. You need an approval process in place to ensure that you are not using a service that was trained on copyrighted material. Many artists have complained that image generators are imitating their style. While the legal implications are still being worked out, there is still the potential for reputational damage if someone later claims your business used an image that plagiarized their work. There are potential, deeper ethical issues to consider here too, about AI images depicting real people or that aren’t sufficiently identified as AI-generated.
Related to that is the question of who has rights to your created images. On DALL-E, it belongs to the user, but Midjourney images are free for anyone to use, regardless of who created them.
If that sounds like a minefield, don’t worry. Remember, this technology and the norms for its acceptable use are still developing. Some time playing around with generative AI tools and exploring how they fit your needs is a good way to formulate processes and policies to share with employees. There is enormous potential here for helping your employees save time as well as expanding their creativity.
Lenovo Pro offers tailored IT solutions, including advanced product selection, dedicated support, and exclusive discounts, to help you understand and overcome anything the business world can throw at you.
Click here to learn more and join Lenovo Pro for free today.