Hallucination, confabulation, delusions, botshit...all generative AI tools are prone to creating material that is inaccurate and not reflective of the real world that we live in. This might look like inventing details about an academic's fake pet chicken, confidently proclaiming that human-written essays are AI-generated, or bizarre interpretations of the JAK-STAT pathway. All generative AI models and tools are prone to hallucination — they are an inherent part of how the models work. There are some techniques to reduce hallucinations, but it is impossible to eliminate them entirely.
One of the most promising techniques to reduce hallucinations is called Retrieval-Augmented Generation, or RAG. In a RAG model, LLMs are given a set of trusted documents, such as peer-reviewed research papers, that they can draw information from. It is common for chatbots that pull information from academic papers, for example, to use RAG or similar strategies.
Hallucination rates are lowest when chatbots are summarizing text. Giving the chatbot a specific text to work with is similar to the strategies that RAG use, but at a smaller scale. The Hallucination Leaderboard keeps track of hallucination rates across different LLM models when summarizing text. GPT 4 Turbo's 2.5% hallucination rate while summarizing text is likely the lowest possible hallucination rate in any scenario at the time of writing. The recent GPT-4 Omni model's hallucination rate is actually higher, with a 3.7% hallucination rate when summarizing text (a worse performance than GPT 3.5!).
AI chatbots are notorious for hallucinating citations. Even when the citations provided are real, they may or may not be on topic. Even when real and on topic, citations given are not guaranteed to be the real source the information you are interested in came from. Also, it is possible for the text of a citation to be part of an AI's training data, but not the actual full-text (especially if the full-text is behind a paywall) — this would mean that the chatbot can regurgitate the citation in full with no knowledge of what is actually in the cited text.
One type of question that tends to lead to hallucinations is asking a large language model how they work, or asking them to perform a task besides writing text or code. For example, at one point Redditors discovered that if you ask ChatGPT what model it was running using very specific language, it would consistently answer that it was using a model called "gpt-4.5-turbo" (which, at the time of writing, does not exist). OpenAI confirmed that this was only an "oddly consistent hallucination." In general, chatbots cannot answer detailed questions about how they work, but may sometimes answer confidently anyway.
Chatbots may also respond confidently when you ask them to perform tasks that they are unable to perform. If asked, chatbots will claim that they can count words or characters in a response, detect AI-generated writing, or even set a timer for you, but they can't do any of these things. LLMs are programmed only to produce text or code. They are notoriously bad at even simple arithmetic, and cannot count, add, or create schedules reliably.
If you are not sure what a tasks a chatbot may be capable of, asking the chatbot itself is very risky, as they will confidently claim that they can do tasks they are incapable of. The best thing to do would be to search in a traditional search engine or ask someone in TTS or your library for help.
Desk Hours: M-F 7:45am-5pm