At our All-to-All meeting on May 29, 2024, Thorsten Hellert, a research scientist in ATAP’s Advanced Light Source Accelerator Physics Program, presented an insightful and often disconcerting talk on the use of Large Language Models (LLMs) in scientific research. ChatGPT, developed by San Francisco-based OpenAI, is possibly the best-known of the current tranche of LLMs. It has made recent breakthroughs in Artificial Intelligence (AI) accessible to everyone and has positioned itself as the fastest-growing consumer application in history.
In his talk, Hellert noted that LLMs have tremendous potential for helping scientific researchers because of their capacity to synthesize a vast body of information and express it comprehensibly. They provide a powerful tool for analyzing and interpreting data and assisting with scientific writing. He also warned about the dangers of using these tools.
Hellert pointed out that while LLMs have a demonstrational ability to write specialized scientific texts, it is essentially conversational AI that uses language models trained to sound convincing but without the ability to interpret and understand the content. Consequently, generated science manuscripts may be misleading, based on non-credible or completely made-up sources.
“The worst part,” he said, “is the ability to write text of such high quality that it might deceive reviewers and readers, with the final result being an accumulation of dangerous misinformation.”
To illustrate his point, Hellert cited some examples of scientific papers recently accepted and published by two well-known and reputable scientific publishers. Both papers, however, were subsequently retracted after it was discovered that they had been written by LLMs and contained fabricated results and made-up references.
In addition to these severe threats to scientific integrity, Heller also showed how LLMs have demonstrated political bias and a disturbing example of how an LLM has demonstrated bias based on race and gender.
Hellert concluded that while LLMs can be valuable tools for researchers when analyzing and interpreting data, they are still in their infancy. As has been shown, they are prone to errors in comprehension, factualness, specificity, and inference and have demonstrable gender, race, and political biases that we must be aware of when using them.
For more information on ATAP News articles, contact caw@lbl.gov.