AI models often generate confident, yet false or skewed outputs, stemming from fundamental design, training, and operational flaws in large language models.
Large language models (LLMs) are crucial for information retrieval and decision-making. However, they struggle with generating false information, known as hallucinations 1. They also produce skewed outputs, referred to as bias 1. These problems arise from core technical aspects of LLM design, training, and operation.
AI hallucinations occur when LLMs create plausible-sounding responses that are factually incorrect or nonsensical . These outputs are not grounded in the input context or verifiable external truth . Hallucinations can range from minor factual errors to completely made-up content .
Models exhibit various kinds of hallucinations. These can include factual inaccuracies or responses lacking logical sense . Some outputs contain contradictions or deviate from source material . Other issues involve irrelevant information, fabricating details, or filling narrative gaps .
| Type | Description |
|---|---|
| Factual Inaccuracies | Incorrect or misleading information that misrepresents established facts (e.g., misstating historical events, scientific facts, or biographical details) |
| Nonsensical Responses | Outputs that lack logical coherence or meaningful content, appearing as strings of unrelated phrases |
| Contradictions | Conflicting statements within the same output or across different interactions, including output, prompt, or factual contradictions |
| Faithfulness Hallucinations | Deviations from provided source material or instructions, such as instruction inconsistencies, context inconsistencies, or logical errors |
| Irrelevant Information | Responses unrelated to the prompt or containing arbitrary content |
| Intrinsic Hallucinations | Output contradicts the provided context |
| Extrinsic Hallucinations | The model invents unsupported information not present in the context |
| Fabrication | Explicitly inventing fake citations, numbers, or entities |
| Confabulation | Filling narrative gaps with invented details |
LLM architecture contributes to hallucinations. Transformer-based attention mechanisms help models focus on input parts 1. Yet, fixed attention windows can limit context retention in longer sequences 1. This leads to earlier content being "dropped," causing coherence breaks and increased hallucinations 1. Important details can also get "lost in the middle" of long context windows due to attention failures 2.
LLMs generate responses token by token. Each new token depends on the previous one 1. This sequential generation limits real-time error correction 1. Initial mistakes can then escalate into confidently incorrect completions 1. Models store knowledge in distributed weights, not structured databases 2. This statistical compression can introduce errors when recalling specific facts . This is especially true for rare entities or events after the training data cutoff .
Probabilistic output generation drives many hallucinations. LLMs optimize for maximum likelihood estimation . They predict the most probable next token based on learned patterns . Factual accuracy is not their primary goal . The model learns how reliable information sounds, not whether it is true 2.
Decoding choices also play a role. Greedy decoders select the most probable token, producing confident outputs 2. Sampling with higher temperatures increases creativity but also raises hallucination risk 2. Top-k or Top-p sampling removes "safe" tokens like "I'm not sure" 2. This forces the model to generate fluent language, often at the expense of factuality 2.
Models fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are rewarded for being helpful and confident . Users often dislike "I don't know" responses, encouraging models to give definitive answers . This contributes to "confident lies" even under uncertainty . During inference, models rely on their own prior outputs . This feedback loop amplifies small mistakes over time, creating "snowball hallucinations" .
Training data greatly influences hallucinations. Poor data quality, including inaccurate or outdated information, produces flawed outputs 3. The principle "garbage in, garbage out" applies directly here 3. Despite vast datasets, models may lack coverage for niche or underrepresented information . When tested on these areas, models tend to confabulate plausible responses instead of admitting ignorance . Models struggle with "long-tail knowledge" and show higher hallucination rates for infrequently appearing entities 3.
If training data includes misinformation, LLMs can generate convincing but false statements 3. These are called "imitative falsehoods" 3. Overfitting on common information can limit generalization for out-of-scope inputs 1. Ambiguous or vague prompts also cause issues . LLMs try to "fill in the blanks" based on learned patterns 1. This leads to speculative and incorrect responses . Models may also predict words based on surface-level statistical patterns, rather than genuine understanding 3. This results in factually incorrect answers 3.
AI bias refers to outputs that are unfair, skewed, or discriminatory . These issues arise from problems within the training data or the model's architecture .
LLMs can unintentionally generate biased or toxic content 4. This perpetuates discrimination and stereotypes 4. Models amplify socio-cultural biases present in their training data 3. A 2024 Stanford study found popular LLMs surfaced racial biases 3. Most internet data is in English 3. This leads models to default to Western-centric perspectives 3. LLMs can also expose sensitive personal information 4. Cases have shown leaks of social security numbers or medical records 4.
Training data characteristics are the most significant cause of AI bias . Biased, outdated, or factually incorrect data directly causes biased model outputs . Lack of diverse representation in training data is another major factor 3. This includes low-resource languages or specific demographic groups 3. Models then struggle with topics where data is sparse 3. This leads to biased or unhelpful outputs for those groups 3.
Over-reliance on synthetic, AI-generated data can lead to "model collapse" . Models degrade over generations, amplifying existing biases or creating new ones . LLMs may also misinterpret nuanced contexts beyond immediate input 4. This leads to inaccuracies that manifest as biased outputs 4.
AI hallucinations and bias share common technical roots. Both issues are heavily influenced by the quality and representativeness of training data . Gaps in knowledge, exposure to misinformation, and inherent societal biases in the data are primary drivers . The core objective of LLMs also exacerbates both problems . Models aim to produce fluent, helpful, and confident responses, not necessarily truthful or unbiased ones . When uncertain about facts, a model might hallucinate . When uncertain about fairness, it might default to statistically dominant patterns from its training data . Deliberate manipulation of AI inputs, like data poisoning, can also induce factual errors and amplify bias .
AI models, particularly LLMs, have demonstrated concerning real-world failures, including generating false legal information and exhibiting political biases. These errors have led to tangible consequences. Understanding these incidents helps clarify AI's current limitations.
AI models often "hallucinate," creating plausible but false information. This is a significant issue in legal contexts 5. These errors stem from the AI's design. It predicts the next word, rather than possessing factual knowledge 5. Outdated, biased, or incomplete training data also contributes to these fabrications 5.
Mata v. Avianca (June 2023)
Lawyers used ChatGPT for a legal brief in New York . The brief contained citations to six fictitious legal cases . It also included bogus quotes generated by the AI . When questioned, the lawyers used ChatGPT again to verify 6. The AI then generated false decisions to support its earlier fabrications 6.
The court dismissed the client's case. It sanctioned the two lawyers and their firm $5,000 . This was for acting in bad faith . The case drew extensive public scrutiny . It highlighted the dangers of unverified AI use in legal practice . This incident spurred widespread discussion among legal professionals 7. They debated the ethical use of AI and content verification 7.
Michael Cohen's Case (December 2023)
Michael Cohen's attorney submitted a motion with non-existent legal citations . Cohen, a disbarred lawyer, provided these citations 8. He used Google Bard for research 8. Cohen believed Bard was a "super-charged search engine" 8. He was unaware of its hallucination capabilities 8.
The judge requested an explanation for the fabricated rulings 8. This led to public admission and scrutiny 8. Cohen blamed his lawyer for failing to verify the citations 8. This was the second instance of fake AI citations in that court 8. The incident added to discussions about lawyers' ethical duty 9.
California Attorney Amir Mostafavi (July 2023)
Attorney Amir Mostafavi filed an appeal in July 2023 10. In it, 21 of 23 cited case quotes were fabricated 10. He stated he used ChatGPT to "improve" his appeal 10. Mostafavi did not realize it would invent information 10. He submitted it without verification 10.
Mostafavi was fined $10,000 by a California court 10. This was reportedly the largest fine for AI fabrications in California 10. The court issued a "blistering opinion" 10. It noted that unverified AI use wasted court time and taxpayer money 10. This incident prompted California authorities to consider regulating AI use in the judiciary 10. The state's Judicial Council issued guidelines for generative AI 10. The California Bar Association also strengthened its code of conduct 10.
Other Legal AI Hallucinations (2023-2025)
ChatGPT falsely accused Georgia radio host Mark Walters of embezzlement in June 2023 . This led to OpenAI's first defamation lawsuit . In May 2024, a U.S. district court judge ordered two law firms to pay $31,100 10. This was for costs tied to "bogus AI-generated research" 10. Cases like Kohls v. Ellison show AI misinformation spreading to expert evidence 7. Lawyers offered testimony, reports, and affidavits containing fake citations 7. The Minnesota Attorney General's office submitted an expert declaration with fake citations 7. They later apologized for the unintentional error 7.
A May 2024 Stanford study investigated legal AI tools 11. Commercial tools from LexisNexis and Thomson Reuters hallucinated 17-34% of the time 11. This was despite claims of being "hallucination-free" 11. General-purpose chatbots hallucinated 58-82% of the time on legal queries 11. Hallucinations include incorrect statements or "misgrounded" citations 11.
AI models, especially LLMs, often exhibit political biases . These biases often reflect those in their vast training datasets . These datasets are scraped from the internet . Efforts to fine-tune for one bias can create others . This can also lead to overcompensation .
Google Gemini Image Generation (February 2024)
Google Gemini generated historically inaccurate images in February 2024 . It depicted America's Founding Fathers as Black . It showed the Pope as a woman . A Nazi-era German soldier was shown with dark skin . It also generated non-White individuals for generic prompts like "Viking" 12. Requests for images of White people were reportedly thwarted 12.
Gemini's bias extended to political prompts 13. "Palestinian" prompts yielded images of children or men with guns 13. "Israeli boy" prompts showed children playing soccer 13. "Israel army" showed smiling soldiers without guns 13. Text responses also displayed bias 14. For example, it equated "libertarians" with "Stalin" in terms of harm 14.
These outputs sparked significant public outrage . There were accusations of anti-White, anti-Israel, and left-leaning bias . Google paused Gemini's image generation feature . India also raised concerns about Gemini's responses regarding Prime Minister Modi 15. Google CEO Sundar Pichai apologized for the "unacceptable" performance . He pledged structural changes and updated product guidelines . Google explained that attempts to fine-tune for diversity overcompensated . This led to errors in other contexts .
General LLM Political Leanings (2024-2025 Research)
Multiple research studies from 2024 and 2025 identify a left-leaning political bias in LLMs . This is especially true in larger models 16. It also appears with highly polarized topics . This bias can cause "stance-flipping quotations" 17. LLMs can alter supporters' statements to express opposing views 17. These models can influence users' political opinions 18. This occurs even among those with opposing political views 18.
The prevalence of political bias in LLMs raises concerns . It could subtly manipulate public discourse . It might influence electoral outcomes and generate biased news . This is critical during election years, such as 2024 across 60+ nations 19. AI-generated misinformation can undermine electoral integrity 19. Researchers emphasize the urgent need for rigorous bias assessment . Responsible AI development and user education are vital . Policies for ethical AI use are being developed globally 9.
AI's increasing autonomy complicates legal liability, forcing a re-evaluation of responsibility among developers, deployers, and users in cases of error . Its unique characteristics challenge established legal frameworks. Discussions now focus on new regulatory approaches .
Determining who is responsible for AI errors is a complex task . AI systems blur traditional lines of responsibility. This is especially true for autonomous or agentic AI . Developers, deployers, and users are all key actors in this evolving landscape .
High-profile incidents highlight serious consequences. Google's Bard chatbot hallucination cost Alphabet around $100 billion in market value . An airline's chatbot provided incorrect information, leading to legal action . New York City's municipal chatbot offered illegal advice, creating potential liability 20. ChatGPT also fabricated a bribery accusation against an Australian mayor, risking a lawsuit .
Several legal theories are being adapted to AI. Each presents unique challenges when applied to AI's dynamic nature.
Product liability claims can arise when AI systems cause harm . This happens if false or dangerous information leads to issues. Plaintiffs are increasingly treating chatbots as consumer products . They assert claims like defective design or failure to warn .
AI as software complicates its "product" classification . AI's continuous learning also makes pinpointing a "defect" difficult . Strict liability may apply if AI is deemed inherently dangerous 21. The proposed US "AI LEAD Act" (S.2937) aims to create a federal framework . It targets design defects, failure to warn, and breach of warranty 22.
Companies can face negligent misrepresentation claims 20. This occurs if they fail to implement reasonable safeguards against AI hallucinations. Negligence claims require proving a duty of care and a breach of that duty . Causation and foreseeable harm are also necessary elements .
Defining "reasonable care" for AI developers is complex . The "black box" nature of AI also makes proving developer negligence difficult for victims . Experts suggest a negligence-based approach to scrutinize human responsibility for AI systems 23.
If an AI acts as a business agent, the business likely bears responsibility for its outputs 20. However, AI lacks legal capacity, unlike human agents . Its outputs can also be unpredictable . This challenges traditional agency law assumptions .
A Canadian tribunal ruled Air Canada liable for a chatbot's promise . They rejected the "separate legal entity" argument . A US District Court decision held an AI screening tool provider could be an "agent" . This establishes potential direct liability for AI vendors .
Licensed professionals using AI tools must supervise and verify outputs 20. This is similar to supervising junior employees. Failing to do so can constitute malpractice 20. AI-generated fabricated citations have led to sanctions for attorneys . The AI Hallucination Cases Database logs over 200 such global cases .
These laws may apply if AI systems provide false or misleading information . The FTC can investigate and bring enforcement actions 21. This protects consumers from deceptive AI practices 21.
Regulatory efforts are emerging globally. They often adopt risk-based approaches. These frameworks emphasize transparency, accountability, and human oversight.
The EU AI Act classifies AI systems by risk level 24. It enforces stringent accuracy for high-risk applications. This includes legal and healthcare systems 24. It also prohibits unacceptable AI practices, like real-time biometric surveillance . Key provisions of the Act become applicable in February 2025 25. A proposed EU AI Liability Directive suggests a "reversal of burden of proof" for victims .
Recent UK case law shows judges sanctioning AI misuse in legal documents 26. Courts expect firms to have strong safeguards. This includes verification protocols and clear AI policies 26. Submitting AI-generated false information could lead to professional or even criminal liability 26. In India, the Bengaluru ITAT retracted a tax ruling . This was due to fictitious case laws generated by an AI tool .
The proposed US "AI LEAD Act" was introduced in September 2025 . This bipartisan bill proposes a federal product liability cause of action against AI developers and deployers . It aims to provide a safety net. States can enact stronger protections if consistent with the Act's principles 22.
California SB 1047 is another proposed law 21. It seeks to create guardrails for AI deployment and governance. It focuses on consumer protection and transparency 21. The NIST AI Risk Management Framework provides guidance for responsible AI deployment . It outlines best practices and risk assessment methods . The US AI Bill of Rights focuses on user verification. It limits developer liability to transparency and safety measures 24. An AI Litigation Task Force was directed in December 2025 22. Colorado's AI Act, enacted in May 2024, targets high-risk AI in critical areas .
There is growing consensus on AI ethics and policy. Many experts advocate for a shared-responsibility model. This distributes liability among developers, deployers, and users . Factors like control and foreseeability guide this distribution .
Developers should prioritize robust dataset validation and regular audits 24. They must ensure proper design, testing, and deployment 27. Deployers should implement mandatory disclosure and warning labels 24. They also need human-in-the-loop verifications 24. Users are expected to verify AI outputs . They must avoid negligent reliance, especially in high-stakes situations .
Balancing innovation with regulation remains a key challenge . Overly stringent requirements could stifle development, particularly for startups . Human oversight can conflict with truly agentic AI systems . For these autonomous systems, meaningful control might require pre-defined boundaries and "kill switches" .
Existing tort law faces significant uncertainty when applied to novel AI systems 28. New legal and regulatory paradigms may be necessary for autonomous AI . Effective AI governance demands a multidisciplinary approach . This requires collaboration among policymakers, technologists, lawyers, and ethicists . This ensures frameworks balance innovation with ethical responsibility .
A shared-responsibility model is crucial for effective AI governance. This approach distributes liability among developers, deployers, and users, considering control and foreseeability in AI failures .
Developers must prioritize robust dataset validation and perform regular audits 24. They should ensure proper design and adequate testing for all AI tools 27. Responsible deployment is a core part of their duties 27.
Deployers need mandatory disclosure and clear warning labels about AI limitations 24. They must also highlight error rates and human-in-the-loop verifications 24. Robust AI governance and guardrails are essential for safe operation 29.
Users must exercise due diligence with AI outputs, critically evaluating information . Verifying AI-generated content is crucial, especially in high-stakes situations . Legal professionals, for instance, must verify AI-generated legal research .
Organizations should implement robust data quality control and continuous monitoring . Human-in-the-loop systems remain vital for oversight . AI transparency and explainability (XAI) also help understand model decisions . Integrating domain-specific knowledge improves reliability, and multi-API consensus can reduce hallucinations by cross-checking responses 30.
Balancing innovation with the protection of rights and well-being is a significant challenge . Overly stringent requirements could stifle AI development, especially for startups . Policymakers aim to foster growth while ensuring adequate safeguards .
Existing tort law, largely common law, faces uncertainty when applied to novel AI systems 28. New legal and regulatory paradigms might be necessary for truly autonomous AI . This requires adapting traditional concepts of responsibility and liability .
Human oversight can conflict with truly agentic AI systems . This creates a tension between regulatory needs for control and AI's autonomous operational value . Meaningful human control for agentic systems might require pre-defined boundaries and "kill switches" .
Effective AI governance needs broad collaboration among experts . Policymakers, technologists, lawyers, and ethicists must work together . This ensures frameworks balance innovation with ethical responsibility and fundamental rights .
Building trustworthy AI applications also demands efficient development tools. Solo founders, for instance, can quickly prototype and launch AI apps using platforms like Atoms.dev. This platform helps describe an idea and get a working app fast. It includes essential features like authentication, databases, and payments. Atoms.dev serves over 500,000 users. You can explore many user-built projects on the Atoms Appworld, or check out how to use the AI App Builder for your own ideas.
Addressing AI liability requires a holistic approach 24. This considers the full AI lifecycle, from development to deployment and usage 24. Continuous monitoring and improvements are key for future AI development .
AI hallucinations happen when models generate confident outputs 2. These responses are factually incorrect or nonsensical . They aren't grounded in the input context or verifiable truth 2.
Hallucinations relate to factual accuracy, producing false statements 2. Bias, conversely, relates to fairness, generating skewed or discriminatory outputs . Both can stem from problems in the training data .
Liability is complex and often shared among developers, deployers, and users . Legal theories like product liability, negligence, and agency law may apply . Emerging regulations aim to clarify these roles .
This is a significant concern for policymakers globally . The challenge is to balance necessary safeguards with fostering technological growth . Overly stringent rules might hinder the development of new AI solutions, especially for startups .
Human oversight is critical for mitigating AI risks and ensuring safety 24. It involves reviewing AI outputs and making final decisions, particularly in high-stakes applications 24. However, this requirement creates tension with truly autonomous AI systems .