Introduction
In this lesson, we'll cover the concept of hallucinations in LLMs, highlighting their influence on AI applications and demonstrating how to mitigate them using techniques like the retriever's architectures. We'll also explore bias within LLMs with examples.
Hallucinations in LLMs
In Large Language Models, hallucinations refer to cases where the model produces text that's incorrect and not based on reality. An AI hallucination is a confident response by the model that cannot be grounded in any of its training data.
There are several possible reasons for these types of hallucinations:
- An LLM could be trained on a dataset that doesn’t have the knowledge required to answer a question.
- An LLM does not have a reliable way to check the factual accuracy of its responses. It just outputs a response that may be wholly incorrect yet convincing.
- The training dataset used to train the LLM may include fictional content and subjective content, like opinions and beliefs.
- LLMs are not generally optimized to say “I don’t know” when they don’t have enough information. Therefore, when the LLM has no answer, it generates whatever is the most probable sequence of text as a response. But note that “most probable” does not necessarily mean “true.” Hallucination is the result of this misalignment. Note that the goal of language modeling is learning the underlying distribution of the words in a language, not to distinguish what is true and what isn’t.
Let’s see a clear example of a hallucination.
On April 12th, 2023, Databricks announced their “Dolly 2.0” model, an open-source 12B parameter language model fine-tuned on a human-generated instruction-following dataset. GPT-3.5 has been trained on a dataset with a cutoff year of 2021. Therefore, it can’t know about Dolly 2.
When GPT-3.5 is asked: “Does Databricks Dolly provide value to the open source community?” it responds with something like this:
GPT 3.5: Databricks Dolly is a tool developed by Databricks, which is designed to help users migrate their data from one platform to another, particularly from on-premise data centers to cloud-based platforms such as AWS or Azure. While Databricks Dolly is not an open source tool, it can still provide value to the open source community in a few ways.[…]
Except for “Databricks Dolly is a tool developed by DataBricks,” the rest of the text is entirely incorrect! Dolly is not a tool to migrate data; it is open-source, contrary to the response. Still, GPT-3.5 knows about Databricks as a company, and therefore, it generates a plausible description of Dolly 2 as a generic product of Databricks.
While this example shows OpenAI GPT-3.5 hallucinating, it’s important to note that this phenomenon applies to other similar LLMs like Bard or LLama.
Strategies to mitigate hallucinations include tuning the text generation parameters, cleaning up the training data, precisely defining prompts (prompt engineering), and using retriever architectures to ground responses in specific retrieved documents.
Misinformation Spreading
One significant risk associated with hallucinations in LLMs is their potential to generate content that, while appearing credible, is factually incorrect. Due to their limited capacity to understand the context and verify facts, LLMs can unintentionally spread misinformation.
There's the potential for individuals with malicious intent to exploit LLMs to spread disinformation deliberately, creating and promoting false narratives. A study by Blackberry found that nearly half of the respondents (49%) believed that GPT-4 could be used to spread misinformation. The unrestricted spread of such false information via LLMs can lead to widespread negative impacts across societal, cultural, economic, and political landscapes. It's crucial to address these issues related to LLM hallucinations to ensure the ethical use of these models.
Tuning the Text Generation Parameters
The generated output of LLMs is greatly influenced by various model parameters, including temperature, frequency penalty, presence penalty, and top-p. We’ll learn more about them in a later lesson in the course.
Higher temperature values promote randomness and creativity, while lower values make the output more deterministic. Increasing the frequency penalty value encourages the model to use repeated tokens more conservatively. Similarly, a higher presence penalty value increases the likelihood of generating tokens not yet included in the generated text. The “top-p” parameter controls response diversity by setting a cumulative probability threshold for word selection.
Leveraging External Documents with Retrievers Architectures
Response accuracy can be improved by providing domain-specific knowledge to the LLM in the form of external documents. Augmenting the knowledge base with domain-specific information allows the model to ground its responses in the knowledge base. After a question from a user, we could retrieve documents relevant to the questions (leveraging a module called “retriever”) and use them in a prompt to produce the answer. This type of process is implemented into architectures typically called “retrievers architectures”.
In these architectures:
- When a user poses a question, the system computes an embedding representation of it.
- The embedding of the question is then used for executing a semantic search in the database of documents (by comparing their embeddings and computing similarity scores).
- The top-ranked documents are used by the LLM as context to give the final answer. Usually, the LLM is asked to extract the answer from those context passages precisely and not to write anything that can’t be inferred from them.
Retrieval-augmented generation (RAG) is a technique that enhances language model capabilities by sourcing data from external resources and integrating it with the context provided in the model's prompt.
Providing access to external data sources during the prediction process enriches the model’s knowledge and grounding. By leveraging external knowledge, the model can generate more accurate, contextually appropriate responses and be less prone to hallucination.
Bias in LLMs
Large language models like GPT-3.5 and GPT-4 have raised serious privacy and ethical concerns. Research has shown that these models are prone to inherent bias, leading to the generation of prejudiced or hateful language, intensifying the concerns regarding their use and governance.
Biases in LLMs arise from various sources: the data, the annotation process, the input representations, the models, and the research design.
For instance, training data that don't represent the diversity of language can lead to demographic biases, resulting in a model's inability to understand and accurately represent certain user groups. Misrepresentation can vary from mild inconveniences to more covert, gradual declines in performance, which can unfairly impact certain demographic groups.
LLMs can unintentionally intensify harmful biases through their hallucinations, creating prejudiced and offensive content.
The data used to train LLMs frequently includes stereotypes, which the models may unknowingly reinforce. This imbalance can lead the models to generate prejudiced content that discriminates against underrepresented groups, potentially targeting them based on factors like race, gender, religion, and ethnicity.
This can be exemplified when an LLM produces content that presents women as inferior or portrays certain ethnicities as intrinsically violent or unreliable. Also, if a model is trained on data biased towards a younger, technologically savvy demographic, it may generate outputs that overlook older individuals or those from less technologically equipped regions. If the model is steeped in data from sources promoting hate speech or toxic content, it might produce damaging and prejudiced outputs, amplifying the diffusion of harmful stereotypes and biases.
These examples underscore the urgent need for constant monitoring and ethical management in the use of these models.
Constitutional AI
Constitutional AI' is a conceptual framework crafted by researchers at Anthropic. It aims to align AI systems with human values, ensuring that they become beneficial, safe, and trustworthy.
In the beginning, the model is trained to self-review and modify its responses based on a set of predetermined principles and a small set of process examples. The next phase involves reinforcement learning training. At this point, the model leans on AI-generated feedback, grounded in the given principles, as opposed to human feedback, to choose the least harmful response.
Constitutional AI employs methodologies like self-supervision training. These techniques allow the AI to learn to conform to its constitution, without the need for explicit human labeling or supervision.
The approach also includes developing constrained optimization techniques. These ensure that the AI pursues helpfulness within the boundaries set by its constitution rather than pursuing unbounded optimization, potentially forgetting helpful knowledge.
Conclusion
The risks of hallucinations and biases in LLMs present significant issues in producing reliable and accurate outputs. The presence of biases can further damage the accuracy and fairness of the outputs, resulting in the ongoing progression of harmful stereotypes and misinformation.
It's imperative to formulate strategies to mitigate these risks. Such strategies should incorporate pre-processing and input control measures, model configuration adjustments, improvement mechanisms, and context and knowledge enhancement techniques.
Integrating the ethical guidelines is essential to ensure that the models generate fair and trustworthy outputs, ultimately achieving responsible use of these powerful technologies.