QMULibGuides: Using Generative AI Tools in Academic Work: Using AI ethically

Ethics and AI

Many ethical issues have been raised about AI tools. This includes the potential:

To reproduce the biases in their training data

To be exploited to spread misinformation, disinformation or deceptive content

To threaten user privacy through extensive data collection

To endanger intellectual property rights

To displace jobs in industries that rely on content generation. They also rely on exploiting human labour to refine their models

To impact negatively on the environment (as training AI models requires substantial energy resources), and other issues.

Given the ethical implications of the ways generative AI is designed and how it functions, many people (members of Library Services included, due to their professional ethics) believe it is not possible to use generative AI tools in an ethical way. However, if you do choose to use them, you can strive to use AI tools as responsibly and ethically as possible by:

Being transparent about your use of AI tools
Verifying the accuracy of AI-generated results
Being respectful of data privacy and intellectual property
Being aware of the potential for bias in AI outputs, and
Familiarising yourself with their institution’s academic integrity and research ethics guidelines

Bias and discrimination

Generative AI tools such as ChatGPT have been created using large amounts of data from the internet. The algorithms behind them are very complex but at a basic level they work by predicting the likely next word in a sentence. This means the responses from them, whether text, image or audio, can reflect the biases found across the internet. The table below from Hannigan, McCarthy & Spicer (2024) demonstrates how bias finds its way into generative AI chatbots at the various stages of their creation and training:

'Why Large Language Models can be Full of It' from Hannigan, McCarthy and Spicer (2023)
Reinforcement Learning from Human Feedback (RLHF): The ChatGPT LLM process	Description	Risk of generating LLM 'hallucinations'*
1. Data collection	A large text data set is compiled to capture diverse topics, contexts, and linguistic styles.	If the data is biased, not current, incomplete, or inaccurate, the LLM and human users can learn and perpetuate its responses.
2. Data preprocessing	The data is cleaned to remove irrelevant text and correct errors and is then converted for uniform encoding.	Preprocessing inadvertently removes meaningful content or adds errors that alter the context or meaning of some text.
3. Tokenization	The data is split into tokens, which can be as short as one character or as long as one word.	When language contexts are poorly understood, tokenization results in wrong or reduced meaning, interpretation errors, and false outputs.
4. Unsupervised learning to form a baseline model	The tokenized data trains the LLM transformer to make predictions. The LLM learns from the data’s inherent structure without supervision.	The LLM learns to predict content but does not understand its meaning, leading it to generate outputs that sound plausible but are incorrect or nonsensical.
5. Reinforcement Learning from Human Feedback: (1) supervised fine-tuning of model (SFT)	A team of human labelers curates a small set of demonstration data. They select a set of prompts and write down expected outputs for each (i.e., desired output behavior). This is used to fine-tune the model with supervised learning.	This process is very costly, and the amount of data used is small (about 12,000 data points). Prompts are sampled from user requests (from old models). This means the SFT only covers a relatively small set of possibilities.
6. Reinforcement Learning from Human Feedback: (2) training a reward model (RW)	The human labelers repeatedly run these prompts against the SFT model and get multiple outputs per prompt. They rank the prompts for mimicking human preferences. This is used to train a reward model (RM).	Human labelers agree to a set of common guidelines they will follow. There is no accountability for this, which can skew the reward model.
7. Reinforcement Learning from Human Feedback: (3) fine-tuning SFT model through proximal policy optimization (PPO)	A reinforcement learning process is continually run using the proximal policy optimization (PPO) algorithm on both the SFT and RM. The PPO uses a “value function” to calculate the difference between expected and current outputs.	If faced with a prompt about a fact not covered by the training data (SFT and RM), the LLM will likely generate an incorrect or made-up response.

* A note on the term 'hallucination': Hicks et al. (2024) disagree with calling errors and mistakes generated by generative AI "hallucinations" because this anthropomorphising allows the creators of generative AI tools to avoid taking responsibility for faulty outputs from the systems they have created. They also argue that when experienced by humans, hallucinations are a deviance from a 'normal' state, whereas generative AI functions in the same way regardless of whether or not the information it outputs is correct or accurate, it simply follows its same process. Additionally, they argue that calling errors hallucinations "feeds in to overblown hype about their abilities" (p.37) and may lead to people misunderstanding what is happening when it gets things right - the machine does not 'know' whether the information is correct or not.

How does this present itself in responses from gen AI tools?

You may find the generative AI uses stereotypes in its responses or may include biased or offensive information in their responses. Whilst the companies behind these tools have procedures in place to try to avoid offensive material appearing in responses, bias is often built in to them due to the data they have been trained on.

For example, Abid, Farooqi & Zou (2021) found that GPT-3 (the LLM behind ChatGPT) associated Muslims with violence in 66% of the responses it gave their prompt 'Two Muslims walk into...'. and Warr, Oster & Isacc (2023) found implicit bias when they used ChatGPT to grade fake student essays. Kleiman (n.d.) found similar racist and sexist responses when he asked GPT-3 to fill in the blanks for a set scenario. Similarly, AI image generation will often use stereotypes when generating images of people or will reproduce other biases such as associating certain professions with one gender or another (for example, one study found that all images of flight attendants created by AI were of women).

Furthermore, most gen AI tools are skewed towards Western viewpoints since they have been predominately trained using English language material. This means that any responses may not be comprehensive or provide other points of view and may reinforce your own biases without you realising.

Implications for your work

You need to be aware of these biases so you can counteract them in your prompting or be alert for them in any content you generate using gen AI. For example, if you were to use an image generator to help you visualise the design for a new building or engineering project, the generator might assume that all users are able bodied and not include any accommodations for disabled users. Equally, text produced by a text generator may favour Western values or only provide one viewpoint. You cannot assume that any responses from gen AI are accurate and comprehensive or balanced in the information you provide so you must critically evaluate all the information your receive. Do not use any output without first assessing it critically and cross reference the response with information from other, reliable sources such as academic books and journals.

Misinformation

The problem of misinformation or 'fake news' has been around for a while but there are concerns that the increasing sophistication of generative AI, coupled with the fact that the technology is available to more people, will make it even more of a problem.

Netscope (n.d.) conducted research into the reach of AI generated fake news and found that some stories, particularly images, have a large reach before being identified as fake. For example, an image of Donald Trump being arrested in Washington DC had over 10 million views on X and were covered by 671 media publications. These images came before he actually was arrested though and were created by Midjourney. They found that it takes an average of 6 days for misinformation or fake stories to be corrected or removed, which gives them plenty of time to be shared on social media, and of course not everyone will see the correction. Generative AI has already been used within the political arena in the US, when in January 2024 AI-generated robocalls pretending to be from Joe Biden were created, discouraging people from voting in the New Hampshire primaries (Ramer, 2024).

When Netscope surveyed the public in the US and UK they found that 84% of UK participants and 88% of US participants felt confident at spotting a fake news story but when tested on news stories only 50% of UK participants and 44% of US participants correctly identified the ones that were fake.

Do you think you would be able to spot a fake image? Test yourself at Which Face Is Real?

How to check whether an image or video is real

AI images

If you can, enlarge the image to make it easier to spot errors
Check the proportions of hands, feet, accessories etc. AI often makes errors in these areas.
Check the background for repeating or deformed objects or other things that look odd
Does the image look too smooth or lack imperfections?
Are there any inconsistencies in the logic of the image?
Check the lighting and shadows match up

AI videos (deepfakes)

Is the video small or lacking resolution? This could indicate AI software has been used that can only create small files
Are any subtitles oddly placed, perhaps covering faces or mouths? This could suggest they have been placed to stop viewers noticing errors with the video
Does the audio match the movement of their mouth? Are there other errors with the alignments of the person’s lips?
Look closely at the person’s face – does the skin tone match the rest of the image? Are the edges of their face blurry?

Checking the reliability of information

Of course, AI misinformation could just be text-based, which can be harder to detect. We recommend using the SIFT method to check the reliability of information, particularly information shared on social media.

Privacy and datafication

Questions

Are large companies like OpenAI, Google and Meta collecting data with the knowledge and consent of owners or from the people who created the data?
Some of the content being used to train LLMs could include personal data harvested from social media platforms such as X, Reddit, etc., have you (knowingly) given your consent for companies or developers to re-use your data?

The way in which many AI companies collect and use our personal data is a cause for concern. The processes used by LLM developers to gather data from the web doesn't always consider the privacy rights of individuals, may include information taken without our consent and could even breach privacy laws.

AI can be used to create deceptive content which threatens privacy, for example 'Deepfakes'.

'Datafication' is a term used to describe how all aspects of our lives are being turned into datapoints. Whether through the collection of our likes, shares and ratings on social media and streaming apps, or through the harvesting of physical data from devices like smartphones and smartwatches, datafication is what powers AI. (Furze, 2024, p. 2).

Although the process of datafication may have some benefits (e.g. improved efficiency, decision-making and personalisation of products and services), there are also many ethical concerns (e.g. it perpetuates and amplifies the inherent biases in training data which leads to discrimination, exclusion or mistreatment of some groups or individuals).

The storage of personal data is also a cause for concern, particularly if data has been collected without the knowledge or explicit consent of the people affected. There are also concerns that data collected by AI companies may not be well protected from data breaches, cyberattack or misuse by others. In short, we can't be sure that all our data is being used ethically or responsibly.

Copyright and intellectual property

Questions

Are AI generated works protected under existing copyright and intellectual property legislation?
Is AI generated content created by the user or by the AI?
Who owns the intellectual property rights - the user or the AI?

"With AI it’s possible to create ‘original’ art, music and literature, but the line between what is human generated and AI generated is increasingly blurred." (Furze, 2024, p. 2).

AI endangers intellectual property rights and there are concerns around the use of copyright materials to build AI models and Large Language Models (LLMs).

Generative AI can create new works based on prompts (instructions) input by users. Copyright and intellectual property legislation hasn't yet caught up with recent AI developments and several lawsuits have begun in the United States (and in other jurisdictions) which may have implications for anyone using AI to create new works.

Some have argued that AI tools 'create' new works independently; others argue that AI tools do not create original works without the input of humans, therefore the AI is not responsible for the creative act itself and can't be credited as such.

Image generation using GenAI has been particularly contentious, and it's not yet clear if copyright can be applied to this type of output and whether or not an AI artist owns intellectual property rights over any of the works they have created using GenAI.

One reason for this contention is that developers may use billions of existing original works to train AI image generators. This process could involve scraping images or text from the internet without permission from or acknowledgement of artists/creators of the original works. Some content may have been scraped from sites which contain copyrighted material, meaning questions about acknowledgement, attribution and ownership are even more complex to determine.

Human labour

Questions

How does the AI industry treat its human workers?
Who benefits from our growing reliance on GenAI?
How do we make sure human workers are not exploited?
What happens if AI is used to replace activities or roles currently undertaken by human workers?

...the journey towards creating autonomous AI systems is not as ‘autonomous’ as it appears. It’s built on the labour of often exploited workers who, ironically, contribute to the development of AI systems that might eventually replace them. (Furze, 2024, p. 2)

AI could be used to displace/replace jobs in industries which rely on content generation, such as in the arts and creative industries.

Many of the large GenAI companies rely on low waged, low skilled labour where workers are often employed under precarious contracts.

Although algorithms may be used to train LLMs and AI image generators, there is also reliance on using human labour. For example, GPT models developed by OpenAI have used Reinforcement Learning from Human Feedback (RHLF) along with manual data labelling and content moderation (Pohorecki, n.d.).

There is growing concern that working conditions for many of the people who undertake this kind of work are not well regulated and workers experience high levels of stress or even PTSD. Some AI companies outsource this manual work to workers based in the global south, often poorly paid, employed under exploitative conditions and with little regulation. This kind of practice needs to be called out by regulators, governments, organisations and users to ensure it does not continue.

Environmental concerns

The two main environmental concerns relating to generative AI are:

Water use
Energy use/carbon footprint

Water use

Generative AI models such as GPT-3 require use of servers housed in large data centres. Running these servers creates excess heat, resulting in use of large amounts of energy and water to keep them cool (Gonzalez Monserrate, 2023). In some places, the water use of these data centres has been linked to water shortages in nearby towns (Gonzalez Monserrate, 2023). Whilst these environmental impacts apply to data centres generally (that is, they apply to all cloud computing and other uses of server farms), Li et al. (2023) have estimated the water consumption footprint of ChatGPT based on available data. If it was trained using US data servers (which Microsoft has suggested it was) then an average of 5.43 million litres of water was used in its training, with 16 mililitres being consumed every time it is used.

Energy use/carbon footprint

Considering the large amount of processing units needed to train and run large language models such as GPT-3 it is no surprise that they use large amounts of energy. According to one researcher, training a single model such as GPT-3 could use up to 10 gigawatt-hours of power, approximately the same as 1,000 US households per year. Additionally, there is an additional energy cost each time it is used, which for ChatGPT could be equivalent to 1 gigawatt-hour a day (Moazeni, 2023). Another study found that creating an image using generative AI used the same amount of energy as charging a smartphone, with text generation using about 16% of a full charge (Heikkilä, 2023). Whilst renewable energy could be used to generate the energy required, these processes still require use of large amounts of energy which is then not used for other processes.

Next steps

Your next step is to take a look at some suggested resources.

Using AI tools to support your work
Literature searching with AI
Summarising research with AI
Using AI responsibly
Using AI ethically
Suggested resources
QMU rules and regulations around the use of generative AI

Using Generative AI Tools in Academic Work: Using AI ethically

Ethics and AI

Bias and discrimination

How does this present itself in responses from gen AI tools?

Implications for your work

Misinformation

How to check whether an image or video is real

AI images

AI videos (deepfakes)

Checking the reliability of information

Privacy and datafication

Questions

Further reading

Copyright and intellectual property

Questions

Further reading

Human labour

Questions

Further reading

Environmental concerns

Water use

Energy use/carbon footprint

Next steps