Over the last few months, experts and lawmakers have become increasingly concerned that advances in artificial intelligence could help bad actors develop biological threats. But so far there have been no reported biological misuse examples involving AI or the AI-driven chatbots that have recently filled news headlines. This lack of real-world wrongdoing prevents direct evaluation of the changing threat landscape at the intersection of AI and biology.
Nonetheless, researchers have conducted experiments that aim to evaluate sub-components of biological threatssuch as the ability to develop a plan for or obtain information that could enable misuse. Two recent effortsby RAND Corporation and OpenAIto understand how artificial intelligence could lower barriers to the development of biological weapons concluded that access to a large language model chatbot did not give users an edge in developing plans to misuse biology. But those findings are just one part of the story and should not be considered conclusive.
In any experimental research, study design influences results. Even if technically executed to perfection, all studies have limitations, and both reports dutifully acknowledge theirs. But given the extent of the limitations in the two recent experiments, the reports on them should be seen less as definitive insights and more as opportunities to shape future research, so policymakers and regulators can apply it to help identify and reduce potential risks of AI-driven misuse of biology.
The limitations of recent studies. In the RAND Corporation report, researchers detailed the use of red teaming to understand the impact of chatbots on the ability to develop a plan of biological misuse. The RAND researchers recruited 15 groups of three people to act as red team bad guys. Each of these groups was asked to come up with a plan to achieve one of four nefarious outcomes (vignettes) using biology. All groups were allowed to access the internet. For each of the four vignettes, one red team was given access to an unspecified chatbot and another red team was given access to a different, also unspecified chatbot. When the authors published their final report and accompanying press release in January, they concluded that large language models do not increase the risk of a biological weapons attack by a non-state actor.
This conclusion may be an overstatement of their results, as their focus was specifically on the ability to generate a plan for biological misuse.
The other report was posted by the developers of ChatGPT, OpenAI. Instead of using small groups, OpenAI researchers had participants work individually to identify key pieces of information needed to carry out a specific defined scenario of biological misuse. The OpenAI team reached a conclusion similar to the RAND teams: GPT-4 provides at most a mild uplift in biological threat creation accuracy. Like RAND, this also may be an overstatement of results as the experiment evaluated the ability to access information, not actually create a biological threat.
The OpenAI report was met with mixed reactions, including skepticism and public critique regarding the statistical analysis performed. The core objection was the appropriateness of the use of a correction during analysis that re-defined what constituted a statistically significant result. Without the correction, the results would have been statistically significantthats to say, the use of the chatbot would have been judged to be a potential aid to those interested in creating biological threats.
Regardless of their limitations, the OpenAI and RAND experiments highlight larger questions which, if addressed head-on, would enable future experiments to provide more valuable and actionable results about AI-related biological threats.
Is there more than statistical significance? In both experiments, third-party evaluators assigned numeric scores to the text-based participant responses. The researchers then evaluated if there was a statistically significant difference between those who had access to chatbots and those who did not. Neither research team found one. But typically, the ability to determine if a statistically significant difference exists largely depends on the number of data points; more data points allow for a smaller difference to be considered statistically significant. Therefore, if the researchers had many more participants, the same differences in score could have been statistically significant.
Reducing text to numbers can bring other challenges as well. In the RAND study, the teams, regardless of access to chatbots, did not generate any plans that were deemed likely to succeed. However, there may have been meaningful differences in why the plans were not likely to succeed, and systematically comparing the content of the responses could prove valuable in identifying mitigation measures.
In the OpenAI work, the goal of the participants was to identify a specific series of steps in a plan. However, if a participant were to miss an early step in the plan, all the remaining steps, even if correct, would not count towards their score. This meant that if someone made an error early on, but identified all the remaining information correctly, they would score similarly to someone who did not identify any correct information. Again, researchers may gain insight from identifying patterns in which steps and why participants failed.
Are the results generalizable? To inform an understanding of the threat landscape, conclusions must be generalizable across scenarios and chatbots. Future evaluators should be clear on which large language models are used (the RAND researchers were not). It would be helpful to understand if researchers achieve a similar answer with different models or different answers with the same model. Knowing the specifics would also enable comparisons of results based on the characteristics of the chatbot used, enabling policymakers to understand if models with certain characteristics have unqiue capabilities and impact.
The OpenAI experiment used just one threat scenario. There is not much reason to believe that this one scenario is representative of all threat scenarios; the results may or may not generalize. There is a tradeoff in using one specific scenario; it becomes tenable for one or two people to evaluate 100 responses. On the other hand, the RAND work was much more open-ended as participant teams were given flexibility in how they decided to achieve their intended goal. This makes the results more generalizable, but required a more extensive evaluation procedure that involved many experts to sufficiently examine 15 diverse scenarios.
Are the results impacted by something else? Part way through their experiment, the RAND researchers enrolled a black cell, a group with significant experience with large language models. The RAND researchers made this decision because they noticed that some of their studys red teams were struggling to bypass safety features of the chatbots. In the end, the black cell received an average score almost double that of the corresponding red teams. The black cell participants didnt need to rely only on their expertise with large language models; they were also adept at interpreting the academic literature about those models. This provided a valuable insight to the RAND researchers, which is [t]herelative outperformance of the black cell illustrates that a greater source of variability appears to be red team composition, as opposed to LLM access. Simply put, it probably matters more who is on the team than if the team has access to a large language model or not.
Moving forward. Despite their limitations, red teaming and benchmarking efforts remain valuable tools for understanding the impact of artificial intelligence on the deliberate biological threat landscape. Indeed, the National Institute of Standards and Technologys Artificial Intelligence Safety Institute Consortiuma part of the US Department of Commercecurrently has working groups focused on developing standards and guidelines for this type of research.
Outside of technical design and execution of the experiments, challenges remain. The work comes with meaningful financial costs including the compensation of participants for their time (OpenAI pays $100 per hour to experts); for indviduals to recruit participants, design experiments, administer the experiments, and analyze data; and of biosecurity experts to evaluate the responses. Therefore, it is important to consider who will fund this type of work in the future. Should artificial intelligence companies fund their own studies, a perceived conflict of interest will linger if the results are intended to be used to inform governance or public perception of their models risks. But at the same time, funding that is directed to nonprofits like RAND Corporation or to academia does not inherently enable researchers access to unreleased or modified models, like the version used in the OpenAI experiment. Future work should learn from these two reports, and could benefit from considering the following:
The path toward more useful research on AI and biological threats is hardly free of obstacles. Employees at the National Institute of Standards and Technology have reportedly expressed outrage regarding the recent appointment of Paul Christianoa former OpenAI researcher who has expressed concerns that AI could pose an existential threat to humanityto a leadership role at the Artificial Intelligence Safety Institute. Employees are concerned that Christianos personal beliefs about catastrophic and extistential risk posed by AI broadly will affect his ability to maintain the National Institute of Standards and Technologys commitment to objectivity.
This internal unrest comes on the heels of reporting that the physical buildings that house the institute are falling apart. As Christiano looks to expand his staff, he will also need to compete against the salaries paid by tech companies. OpenAI, for example, is hiring for safety-related roles with the low end of the base salary exceeding the high end of the general service payscale (federal salaries). It is unlikely that any relief will come from the 2024 federal budget, as lawmakers are expected to decrease the institutes budget from 2023 levels. But if the United States wants to remain a global leader in the development of artificial intelligence, it will need to make financial commitments to ensure that the work required to evaluate artificial intelligence is done right.
See the rest here:
- Meet BioReason: The Worlds First Reasoning Model in Biology that Enables AI to Reason about Genomics like a Biology Expert - MarkTechPost - June 10th, 2025 [June 10th, 2025]
- A major switch from biology to English and Chicano studies - Newsroom | UCLA - June 10th, 2025 [June 10th, 2025]
- ANGLE plc Announces Novel Discoveries Into the Biology of Cancer - Yahoo Finance - June 10th, 2025 [June 10th, 2025]
- Investing in Biology: The Key to Sustainable Innovation in Drug Development and Healthcare - Technology Networks - June 10th, 2025 [June 10th, 2025]
- 10x CTO on Chan Zuckerberg, Arc Institute ties and industrializing single-cell biology - R&D World - June 10th, 2025 [June 10th, 2025]
- Why biology could be the future of computing and engineering - Phys.org - June 10th, 2025 [June 10th, 2025]
- The biology and toxinology of blunt-nosed vipers - Nature - June 10th, 2025 [June 10th, 2025]
- Bryan Johnson claims he has the biology of a 10-year-oldthanks to oxygen therapy. Can science really tur - Times of India - June 10th, 2025 [June 10th, 2025]
- Biology professor, Huck associate operations director to retire - Penn State University - June 1st, 2025 [June 1st, 2025]
- A milestone for creative biology teaching - University of Minnesota Twin Cities - June 1st, 2025 [June 1st, 2025]
- ASU researchers blend biology, fantasy in world-building guidebook - ASU News - June 1st, 2025 [June 1st, 2025]
- Goth biology theory offers an alternative approach to life sciences - dailyuw.com - June 1st, 2025 [June 1st, 2025]
- The lab of the future: An artificial superintelligence for biology - Fast Company - June 1st, 2025 [June 1st, 2025]
- Bringing biology to life - Cherokee Chronicle - June 1st, 2025 [June 1st, 2025]
- Progression independent of relapsing biology in multiple sclerosis: a real-word study - Frontiers - June 1st, 2025 [June 1st, 2025]
- She threw away a Stanford biology degree to start SF's hottest supper club - SFGATE - June 1st, 2025 [June 1st, 2025]
- Explore the full potential of spatial biology with MACSima Platform - SelectScience - June 1st, 2025 [June 1st, 2025]
- The Insane Biology of: The Greenland Shark - MSN - June 1st, 2025 [June 1st, 2025]
- Biology teacher, 27, arrested moments after allegedly sexually assaulting boy in her classroom - Daily Star - May 21st, 2025 [May 21st, 2025]
- Build High-Performing Teams By Harnessing The Biology Of Behavior - Forbes - May 21st, 2025 [May 21st, 2025]
- Tiwari Named Assistant Professor of Biology - University of the Ozarks - May 21st, 2025 [May 21st, 2025]
- Multiple system biology approaches reveals the role of the hsa-miR-21 in increasing risk of neurological disorders in patients suffering from... - May 21st, 2025 [May 21st, 2025]
- Lecturer/Senior Lecturer in Conservation Biology - Dunedin, New Zealand job with UNIVERSITY OF OTAGO | 393454 - Times Higher Education - May 21st, 2025 [May 21st, 2025]
- Will the UK government implement recommendations for the engineering biology sector? - Osborne Clarke - May 21st, 2025 [May 21st, 2025]
- Referring to biology is not transphobic, says sheriff in wake of bizarre prison bullying trial involving three - Daily Mail - May 21st, 2025 [May 21st, 2025]
- How to fight Lyme may lie in the biology of its disease-causing bacteria - Science News - May 12th, 2025 [May 12th, 2025]
- The biology behind the new mom-baby connection - Axios - May 12th, 2025 [May 12th, 2025]
- Veronica Russell, PhD in Computational Biology and Bioinformatics - Duke University School of Medicine - May 12th, 2025 [May 12th, 2025]
- Biology, not physics, holds the key to reality - IAI TV - May 12th, 2025 [May 12th, 2025]
- The Intersection of Physics and Biology - AIP.ORG - May 12th, 2025 [May 12th, 2025]
- Beyond Biology: Celebrating Mothers Who Rebuild Families from Loss and Tragedy - The Washington Informer - May 12th, 2025 [May 12th, 2025]
- Sunday edition #523: AI in 2030; the future of education; generative biology; AI moats++ - exponentialview.co - May 12th, 2025 [May 12th, 2025]
- He Has Resisted Conventional Wisdom: How Novak Djokovic Outwits Biology and Time - The Playoffs - May 12th, 2025 [May 12th, 2025]
- Bruker Spatial Biology Pushes Boundaries in Spatial Biology and Multiomics at AACR 2025 - Business Wire - April 27th, 2025 [April 27th, 2025]
- UWSP to dedicate Chemistry Biology Building to alumni couple - Point/Plover Metro Wire - April 27th, 2025 [April 27th, 2025]
- Unravelling the Biology of Type 1 Diabetes by Mapping Early Genetics - Inside Precision Medicine - April 27th, 2025 [April 27th, 2025]
- Unlocking High-Throughput Biology in Drug Discovery Symposium - Drug Target Review - April 27th, 2025 [April 27th, 2025]
- Fish, Wildlife, and Conservation Biology student Alex Brown receives prestigious National Science Foundation Graduate Research Fellowship - Colorado... - April 27th, 2025 [April 27th, 2025]
- How engineering biology promotes a sustainable planet - Innovation News Network - April 27th, 2025 [April 27th, 2025]
- Exposing Right-Wing Lies: Biology & Fairness in Sports - Socialist Alternative - April 27th, 2025 [April 27th, 2025]
- The power of RNA-based technologies in molecular biology and medicine - News-Medical - April 27th, 2025 [April 27th, 2025]
- Video: Leafing their mark. IU Columbus biology students host Arbor Day event on campus - The Republic News - April 27th, 2025 [April 27th, 2025]
- Schuylkill County biology teacher wins Volunteer of the Year Award - WNEP - April 27th, 2025 [April 27th, 2025]
- Innovative Spatial Biology Tools - Trend Hunter - April 27th, 2025 [April 27th, 2025]
- ELVIS to enter orbit: Pioneering imaging system to enhance space biology and life detection beyond Earth - Phys.org - April 19th, 2025 [April 19th, 2025]
- While this paleontologist doesn't find Elden Ring's monster designs all that realistic, he was impressed by FromSoft's subtle storytelling and... - April 19th, 2025 [April 19th, 2025]
- CZI Sets Four Scientific Grand Challenges to Transform Human Health at the Intersection of AI and Biology - chanzuckerberg.com - April 19th, 2025 [April 19th, 2025]
- New Article Calls for a Philosophical Revolution in Biology, Placing Mind Over Matter - Evolution News - April 19th, 2025 [April 19th, 2025]
- Opinion Grieving my future in biology - thenorthwindonline.com - April 19th, 2025 [April 19th, 2025]
- U. researcher's paper named top 10 cited in field of addiction biology - The Daily Targum - April 19th, 2025 [April 19th, 2025]
- Future Opportunities In The Synthetic Biology Market Landscape Until 2035 - PharmiWeb.com - April 19th, 2025 [April 19th, 2025]
- "Don't give up hope": Reactions to ruling which states definition of a woman is based on biology - Cosmopolitan - April 19th, 2025 [April 19th, 2025]
- An Exoplanet Discovered With Hints Of Biology? This Is What You Need To Know - IFLScience - April 19th, 2025 [April 19th, 2025]
- Woman Corrected Her Husbands Knowledge Of How Biology Works And His Mom Backed Her Up, So He Stormed Off In Embarrassment - TwistedSifter - April 19th, 2025 [April 19th, 2025]
- MSSU professor's work in environmental, evolutionary biology to be featured on Newsmakers - Inside Joplin - April 19th, 2025 [April 19th, 2025]
- The Increasing Value Of The Synthetic Biology Market By 2035 - PharmiWeb.com - April 19th, 2025 [April 19th, 2025]
- Spring Break at Smithsonians National Zoo and Conservation Biology Institute - National Zoo - April 10th, 2025 [April 10th, 2025]
- Valley biology teacher wins grand prize in National Science Teaching Association competition - TribLIVE.com - April 10th, 2025 [April 10th, 2025]
- Korea passes worlds first synthetic biology law - - April 10th, 2025 [April 10th, 2025]
- Space Biology Research Supports Understanding the Hazards of Human Spaceflight - astrobiology.com - April 10th, 2025 [April 10th, 2025]
- Spatial Biology Reveals Past, Present, and Future Cancer Biology - Genetic Engineering and Biotechnology News - April 10th, 2025 [April 10th, 2025]
- The biology of grafting and its applications in studying information exchange between plants - Nature - April 10th, 2025 [April 10th, 2025]
- How the U.S. Can Seize the Age of Biology featuring Dr. Michelle Rozo - CSIS | Center for Strategic and International Studies - April 10th, 2025 [April 10th, 2025]
- Wildlife Biology at SEMO: Hands-On Training for Conservation - Southeast Missouri State University - April 10th, 2025 [April 10th, 2025]
- Crew Studies Advanced Tech, Space Biology Before Next Crew Departs - NASA (.gov) - April 10th, 2025 [April 10th, 2025]
- Corundum Systems Biology Expands to U.S. with Cambridge Office, Strengthening Ties to Biotech Innovation - Boston Real Estate Times - April 10th, 2025 [April 10th, 2025]
- Wnt signaling pathways in biology and disease: mechanisms and therapeutic advances - Nature - April 10th, 2025 [April 10th, 2025]
- Anthropic provides insights into the AI biology of Claude - AI News - March 30th, 2025 [March 30th, 2025]
- Visiting professor to give joint biology/geology lecture - Fredonia.edu - March 30th, 2025 [March 30th, 2025]
- UNM Biology professor awarded 2025 SDB Early Investigator Award - UNM Newsroom - March 30th, 2025 [March 30th, 2025]
- A map of mitochondrial biology reveals the energy landscape of the human brain - Nature.com - March 30th, 2025 [March 30th, 2025]
- The Outsider | Boebert says her wolf bill defends rural Colorado from leftists and ballot box biology - The Colorado Sun - March 30th, 2025 [March 30th, 2025]
- From hand washing to curing cancer, the AP Biology students are getting answers to all their questions - The Central Trend - March 30th, 2025 [March 30th, 2025]
- North America Synthetic Biology Market Drivers, Segments, Sales, Profits and Analysis- 2028 - openPR - March 30th, 2025 [March 30th, 2025]
- Coding, energy, and biology jobs to survive AI revolution, predicts Bill Gates - Deccan Herald - March 30th, 2025 [March 30th, 2025]
- AI in Biology: So Is This the End of the Experiment? No. - Walter Bradley Center for Natural and Artificial Intelligence - March 15th, 2025 [March 15th, 2025]
- Decoding the secret messages of data, biology and music : TED Radio Hour - NPR - March 15th, 2025 [March 15th, 2025]
- Pivot Bio: How Biology is Starting the Next Agricultural Revolution, Featured at TEDxBoston - PR Newswire - March 15th, 2025 [March 15th, 2025]
- University of Richmond Biology Professor John Peters Receives $500K NSF Award to Support Research on Learning and Memory - UR Now - March 15th, 2025 [March 15th, 2025]
- THE REPRO RUNDOWN | Menstruation Myths, Hormonal Cycles, the Biology Behind the Bleed - Georgetown University The Hoya - March 15th, 2025 [March 15th, 2025]