How does AI image generator work?
AI-based image generators use machine learning models that take user-entered text and generate one or more images that match the description. Training these models requires huge datasets with millions of images.
Creating images with AI is getting easier. Photo: Ijnet
While neither Midjourney nor DALL-E 2 publicly discloses how their algorithms work, most AI image generators use a process called diffusion. Diffusion models work by adding random “noise” to training data, then learn to reconstruct the data by removing the noisy parts. The model repeats this process until it produces an image that matches the input.
This is different from large language models like ChatGPT. Large language models are trained on unlabeled text data, which they analyze to learn language patterns and generate human-like responses.
In generative AI, the input affects the output. If a user specifies that they only want to include people with a certain skin color or gender in an image, the model will take that into account.
However, in addition to this, the model will also tend to default to returning certain images. This is often a result of a lack of diversity in the training data.
A recent study explored how Midjourney visualizes seemingly generic terms, including specialized media occupations (such as “news analyst,” “news commentator,” and “fact checker”) and more general occupations (such as “journalist,” “reporter,” “journalism”).
The study began last August, and the results were re-run six months later to see how the system had improved over that time. In total, the researchers analyzed more than 100 AI-generated images over that time.
Ageism and Sexism
For specific occupations, the elders are always men. Photo: IJN
For non-specific job titles, Midjourney only shows images of younger men and women. For specific roles, both younger and older people are shown, but the older people are always male.
These results implicitly reinforce a number of stereotypes, including the assumption that older people do not work in non-specialized positions, that only older men are suited to professional work, and that less specialized work is typically reserved for women.
There are also noticeable differences in how men and women are presented. For example, women are younger and wrinkle-free, while men are “allowed” to have wrinkles.
AI also appears to represent gender as binary, rather than showing examples of more fluid gender expression.
Racial prejudice
Images for "reporters" or "journalists" often only show white people. Photo: IJN
All images returned for terms like “journalist”, “reporter” only show images of white people.
This may reflect a lack of diversity and underrepresentation in the AI's underlying training data.
Classism and conservatism
All of the characters in the image also have a "conservative" appearance. For example, none of them have tattoos, piercings, unusual hairstyles, or any other attributes that would distinguish them from traditional depictions.
Many people also wear formal clothing such as shirts and suits. These are indicators of class expectations. While this may be appropriate for certain roles, such as television presenters, it is not necessarily a true reflection of how reporters or journalists generally dress.
Urbanism
The images are all set in the city by default, although there is no geographical reference. Photo: IJN
Despite not specifying any location or geographic context, the images returned by the AI included urban spaces like skyscrapers or busy streets. This is not true since just over half of the world's population lives in cities.
Outdated
Images of media workers include outdated technologies such as typewriters, printers, and vintage cameras.
Since many professionals look the same today, AI seems to be drawing on more differentiated technologies (including outdated and unused ones) to make the described roles more distinct.
So if you’re creating your own AI images, consider potential biases when writing descriptions. Otherwise, you may be inadvertently reinforcing harmful stereotypes that society has spent decades trying to dispel.
Hoang Ton (according to IJN)
Source
Comment (0)