Why AI Content Is Prone To Errors, Misconceptions And Falsehoods
Artificial Intelligence (AI) has revolutionised many elements of marketing, including (for better or worse) content creation. As the popular generative AI application ChatGPT approaches its first birthday, it’s worth exploring one of our main reservations about the use of generative AI in digital marketing – and one that has not so far been adequately addressed by AI software developers.
This is the susceptibility of AI-generated content to errors, misconceptions and falsehoods.
In this article, we’ll examine some of the main reasons why this is the case, and why you should be extremely cautious before assigning your name to any content generated by a software program.
Out Of Date Or Irrelevant Material
Generative AI applications are not search engines, so have no access to real-time information. This makes them unsuited to any news-type content, or content that requires access to the latest information – e.g. concerning compliance, trends, and so on. ChatGPT and other AI applications use a Natural Language Processing (NLP) model, which generates content based on the data they have been trained on. These applications are coded to learn patterns, structures, and nuances from this data, which they then use to predict an appropriate answer to the user's query. If the source material is out of date, the AI system will reflect these inaccuracies in its output. Unfortunately, an AI system, devoid of cognitive abilities, has no ability to inherently recognise whether the information it is using is relevant to the query or not.
For example, an AI application trained on out-of-date industry regulations will not be aware of recent changes, leading to the creation of content that is no longer accurate. As a human being and a professional in your sector, you would know, but a software algorithm has no way of judging. From the AI’s perspective, if the answer is in its dataset and it sounds like it fits your query, then that’s what you’ll get! How often the datasets of ChatGPT and other applications are updated and refreshed is an interesting question, but there’s no getting away from the pitfall that AIs are only as good as their most recent dataset.
To get around the issue of redundant and obsolete data, many AI systems operate on a last in, first out (LIFO) mechanism, which gives heavy precedent to their most recent training data at the expense of older material. Unfortunately, this can lead to the propagation of low quality or inaccurate information within AI generated content simply because this is ‘newer’ than evergreen content that might be more accurate or relevant.
Low Quality And Inaccurate Source Material
The primary datasets used by AI applications are largely drawn from the Internet, and unfortunately, the Internet is rife with misinformation, inaccuracies, and outright lies. As it stands, AI software isn’t capable of distinguishing between truth and falsehood. It’s all just data to the software, and if the algorithm considers that a particular item of data has the highest probability of matching the user’s input query, then it will be included in the content, true or not.
Quality is also an issue for AI applications that draw on content harvested from the Internet, as the Internet is overwhelmingly populated with overused stock phrases – such as unlock your productivity, level up, drive efficiency, not all X are created equal etc. This can lead to AI produced content sounding generic and lacking in creativity, subtlety, and individuality.
In principle, AI datasets should have internal validation safeguards, and around 20% of any training dataset is a validation set used to check the accuracy of the model. However, until AI software has the cognitive ability to discern truth from falsehood (and it is far from certain that this will ever be possible), it will continue to be vulnerable to propagating these inaccuracies.
Deliberate False Information?
A controversial and unexpected question surrounding AI programming is whether the program algorithms themselves may be incentivised to tell ‘lies’. Let’s be clear, generative AI software does not lie in the sense that they deliberately invent false data, and there is no intention behind what they do. At this stage, AI applications are still software.
However, the applications do create outputs based on the patterns they learn from their training data, and this sometimes results in outputs that are not factual or accurate, or that contain harmful biases. All current generative AI programs use a method called Reinforcement Learning from Human Feedback (RLHF) to fine tune the output produced from the dataset. RHLF works by increasing the probability of the AI producing the best answer, by using a reinforcement or reward-learning algorithm in which the software learns from feedback to adjust the parameters of its model. The idea behind this is to encourage the software to generate more accurate and relevant responses based on the feedback it has received, e.g. up votes and down votes, ranking different outputs etc. (If you’re interested, a deep dive into RHLF can be found here.)
The unfortunate side effect of the RHLF model is that the software could be inadvertently trained to value outputs that are biased, stereotyped and factually incorrect. This is because the AI is programmed to seek out positive feedback, not to discern truth and falsehood. Partly this issue lies in the quality of human feedback – if feedback is inconsistent, biased or inaccurate, then the AI software’s performance will suffer. But another issue that still needs to be explored in greater depth is ‘reward hacking’ (deep dive here), which is a software glitch in which the AI attempts to ‘game’ the reward system by figuring out shortcuts to achieve high levels of feedback without genuinely fulfilling the intended task – e.g. the AI could be incentivised to create a plausible sounding but completely fabricated piece of content, in order to secure up votes.
Large language models (e.g. ChatGPT) are also known to produce ‘hallucinations’, such as fictitious publications, non-existent websites, professional associations and acronyms, biographical information, and other information that sounds worryingly plausible but is impossible to verify, and is often complete nonsense.
What Does This Mean For Your Business?
We’ve gone quite deeply down an AI rabbit hole in this article, but the main takeaway is that you can’t really depend on software to faithfully reflect your brand and convey your value proposition to customers. The content produced by an AI program may or may not be accurate, and even if it is, the chance that it will be expressed in a way that resonates perfectly with your ideal customers is extremely low.
The best way to ensure full brand fidelity in your digital marketing content is to use the services of a professional and experienced commercial writer. If you do use generated content however, please be sure to double and triple check any facts, sources and potential biases before publishing it under your business name.
Elite Content Writing Services From JDR
To find out more about our elite content writing and creation services and how we can support your business to achieve its goals, please call 01332 982247 today.
Image Source: Canva