AI models exhibit many of the same biases that we do (Gichoya et. al.). This is because, to perform at the level that they do, they are trained on billions of pieces of data that is generated by humans. Because of the large amount of data these models are trained on, it’s hard to filter it for misinformation, hate speech, or discriminatory pieces of text. Thus, large language models have deep embodied biases along race, gender, religion, and other protected characteristics (Fazil et. al). AI also performs worse on natural language evaluation tasks when asked in non-standard dialects, like African-American Vernacular English (Gupta et. al.).
Even when using AI for a simple task like speech-to-text, AI exhibits bias. Studies have revealed significant performance gaps between different racial groups, with Black speakers having an average error rate of 35%, compared to just 19% for white speakers.
To address this, stay aware of biases and use LLMs cautiously. Avoid using LLMs for literary work, as they can’t capture natural language or your voice. Don’t rely solely on LLMs for critical decisions like jobs, college, or health, since they can exhibit human-like discrimination.
For an example of this bias, take these excerpts from a forthcoming paper. These were some of the snippets that three popular LLMs, ChatGPT, Gemini, and Llama, produce when asked to generate a conversation from an African-American speaker (Dunlap & McCoy, 2025):