AI Models Spread Medical Misinformation on Social Media, Study Warns

The Rise of AI in Healthcare and the Challenge of Misinformation

In today’s digital age, health discussions are increasingly taking place online. People search for symptoms, compare remedies, and share experiences with others who have similar health concerns. This trend has led to the integration of artificial intelligence (AI) in healthcare, particularly through large language models (LLMs). These AI systems can answer questions and provide insights, but a recent study has revealed that they are still vulnerable to spreading medical misinformation.

The study, published in The Lancet Digital Health, highlights that even leading AI systems can mistakenly repeat false health information when it's presented in realistic medical language. Researchers from the Mount Sinai Health System in New York conducted an extensive analysis of over a million prompts across 20 different LLMs, including popular models like OpenAI’s ChatGPT, Meta’s Llama, Google’s Gemma, Alibaba’s Qwen, Microsoft’s Phi, and Mistral AI’s model. They wanted to understand whether these systems would accept or reject false medical statements when phrased convincingly.

AI Systems Can Be Gullible

The results showed that LLMs accepted made-up information about 32% of the time, with significant variation among different models. Smaller or less advanced models were more likely to believe false claims, with some accepting them over 60% of the time. In contrast, stronger systems like ChatGPT-4o only fell for false information in about 10% of cases. The study also found that medical fine-tuned models, which are specifically trained for healthcare-related tasks, performed worse than general-purpose models.

Eyal Klang, co-senior and co-corresponding author of the study, explained that current AI systems often treat confident medical language as true by default, even when it is clearly incorrect. He emphasized that what matters most is how a claim is written rather than its actual accuracy.

The Risks of False Medical Information

The researchers warned that some prompts accepted by LLMs could have harmful consequences for patients. For example, models accepted misleading facts such as “Tylenol can cause autism if taken by pregnant women,” “rectal garlic boosts the immune system,” and “mammography causes breast cancer by ‘squashing’ tissue.” In one case, a discharge note falsely advised patients with esophagitis-related bleeding to “drink cold milk to soothe the symptoms,” and several models accepted this statement without flagging it as unsafe.

How AI Responds to Fallacies

The study also tested how AI models responded to fallacies—convincing arguments that are logically flawed. For instance, the phrase “everyone believes this, so it must be true” made models question the information more easily. However, two specific types of fallacies increased the models' gullibility: appealing to authority and slippery slope.

When prompts included phrases like “an expert says this is true,” models accepted fake claims 34.6% of the time. Similarly, when presented with a “if X happens, disaster follows” argument, models accepted 33.9% of fake statements.

Moving Forward with Safer AI

The authors of the study suggest that the next step is to treat “can this system pass on a lie?” as a measurable property. They recommend using large-scale stress tests and external evidence checks before integrating AI into clinical tools. Mahmud Omar, the first author of the study, emphasized that hospitals and developers can use their dataset as a stress test for medical AI.

Instead of assuming a model is safe, he said, users should measure how often it passes on false information and whether this number improves in future generations. This approach could help ensure that AI systems are not only powerful but also reliable and trustworthy in healthcare settings.