What is an AI hallucination? I figured the best bet was to ask a reliable source. Since generative AI is where we would find an AI hallucination, I asked ChatGPT to help me understand it.
Here is what ChatGPT says: “AI hallucinations, or the instances where AI models like myself generate incorrect or nonsensical information despite prompts or data input, are a crucial area to explore, especially as AI becomes more integrated into various sectors.”
The problem is that you may not even realize just how many AI hallucinations you are seeing every time you generate content from an LLM. Using LLMs to generate content, code, and ideas, has been a fantastic new wave of opportunity. That opportunity needs to be weighed against the real risk of how AI hallucination can create false information and take your brand down with it.
Real AI Hallucinations in Action
I had a simple project. I needed a podcast episode list of all titles, guests, dates, and descriptions for the DiscoPosse Podcast as part of our One Million Downloads celebration. The goal was to take the XML feed and render a simple markdown-formatted table that could be used to feed other processes.
The XML file was downloaded to make sure it was all there, and I began by asking ChatGPT to convert the XML to markdown. Here was the first test:
Great! Now I know it works. The next prompt was to continue to create the markdown output for the entire file. You likely already know there are file size and context size limitations with the popular public LLMs, so this isn’t a surprise that it told me, “Due to the length and size of the data, I will show the table in multiple parts.”.
Everything looks good, so I just prompt ChatGPT to keep going and to generate as many entries as possible in each response. The trick with ChatGPT, Gemini, and others is that you need to nudge it to “keep going,” and it will pick up where it left off and move the context window accordingly.
You can see it is doing fine and keeps going past the first 10 episodes. This is where things get interesting. You’ll notice that the next batch goes from Episode 240, counting back to 236. Another “keep going” prompt nudges ChatGPT to work on the next part of the output.
As I was watching it tick away a few episodes at a time, I realized something unexpected was happening. This is where it gets very interesting.
The Greatest Podcasts (N)ever Recorded
ChatGPT began getting a little creative. By creative, I mean it was creating a list of episodes that didn’t exist. The output was in the exact style and format that matched the other episodes that were real episodes.
The AI hallucination was so close to real that I had to double-check to see if I had just forgotten I spoke to some of these folks. I was especially excited about the supposed episode I had with Andy Jassy. It looks really great!
This is where we have to pause and remind ourselves of the problem with generative AI and LLMs today. In this case, ChatGPT didn’t just invent episodes. It invented episodes that specifically matched the rest of the content. This is why it was so difficult to see that it wasn’t true unless I knew the original source.
It was especially interesting because I had given it a fixed XML file to use as the source and confirmed that the file was being read fully when searching for other information about the content.s
The goal of generative AI is to create text output that fits a style representing patterns. ChatGPT created new content based on what it decided would be appropriate guests based on my podcast style and knowledge from its model. The amazing thing was how aligned the guests, titles, and descriptions were, so that I could barely spot them.
(Mis)trust and Verify
Artificial intelligence systems sometimes create misleading or completely false outputs. AI hallucinations are what we call them, but the issue is trust. These errors can be harmless or seriously harmful. The bigger issue is why these mistakes happen, what they mean, and how we can reduce their impact.
AI-generated content needs to be verified. It’s essential to ensure the accuracy and reliability of the information and sources. Before you think about publishing anything that is AI-generated, you need to consider a few important things:
- Verify with reliable sources: Cross-check the information provided by the AI with trusted sources. Use well-established news outlets, academic journals, or verified databases to confirm the facts.
- Use multiple AIs: Ideally, you want to generate content using different AI models or tools to see if the outputs are consistent. Discrepancies might indicate areas where further verification and research are needed.
- Fact-Checking Tools: Find fact-checking websites and tools that can help verify data points, statistics, and statements. You need to be able to check your output against up-to-date information.
- Domain Expert Review: If the content is specialized, such as medical, legal, or technical information, have a domain expert review it. Experts may be able to spot errors that might not be obvious to the average reader.
- Iterative Review: Review the content multiple times and make adjustments as necessary. Sometimes, repeated reading can help catch inaccuracies that were missed initially.
- Critical Evaluation: Apply critical thinking to evaluate the logic and consistency of the information. Check for any internal contradictions or claims that seem implausible.
- Update regularly: AI models generate content based on the information they were trained on. That information might be outdated. Regular updates and checks can ensure the content remains accurate and relevant.
In the end, you need to be diligent about your use if generative AI. Creating long-form content is especially risky without the use of domain experts to verify. You need to think of how to integrate generative AI in ways that reduce the risk of hallucinations and false knowledge. Remember, it’s your name on that content and the knowledge within; whether its truth or hallucination.