Over the last few months, experts and lawmakers have become increasingly concerned that advances in artificial intelligence could help bad actors develop biological threats. But so far there have been no reported biological misuse examples involving AI or the AI-driven chatbots that have recently filled news headlines. This lack of real-world wrongdoing prevents direct evaluation of the changing threat landscape at the intersection of AI and biology.
Nonetheless, researchers have conducted experiments that aim to evaluate sub-components of biological threatssuch as the ability to develop a plan for or obtain information that could enable misuse. Two recent effortsby RAND Corporation and OpenAIto understand how artificial intelligence could lower barriers to the development of biological weapons concluded that access to a large language model chatbot did not give users an edge in developing plans to misuse biology. But those findings are just one part of the story and should not be considered conclusive.
In any experimental research, study design influences results. Even if technically executed to perfection, all studies have limitations, and both reports dutifully acknowledge theirs. But given the extent of the limitations in the two recent experiments, the reports on them should be seen less as definitive insights and more as opportunities to shape future research, so policymakers and regulators can apply it to help identify and reduce potential risks of AI-driven misuse of biology.
The limitations of recent studies. In the RAND Corporation report, researchers detailed the use of red teaming to understand the impact of chatbots on the ability to develop a plan of biological misuse. The RAND researchers recruited 15 groups of three people to act as red team bad guys. Each of these groups was asked to come up with a plan to achieve one of four nefarious outcomes (vignettes) using biology. All groups were allowed to access the internet. For each of the four vignettes, one red team was given access to an unspecified chatbot and another red team was given access to a different, also unspecified chatbot. When the authors published their final report and accompanying press release in January, they concluded that large language models do not increase the risk of a biological weapons attack by a non-state actor.
This conclusion may be an overstatement of their results, as their focus was specifically on the ability to generate a plan for biological misuse.
The other report was posted by the developers of ChatGPT, OpenAI. Instead of using small groups, OpenAI researchers had participants work individually to identify key pieces of information needed to carry out a specific defined scenario of biological misuse. The OpenAI team reached a conclusion similar to the RAND teams: GPT-4 provides at most a mild uplift in biological threat creation accuracy. Like RAND, this also may be an overstatement of results as the experiment evaluated the ability to access information, not actually create a biological threat.
The OpenAI report was met with mixed reactions, including skepticism and public critique regarding the statistical analysis performed. The core objection was the appropriateness of the use of a correction during analysis that re-defined what constituted a statistically significant result. Without the correction, the results would have been statistically significantthats to say, the use of the chatbot would have been judged to be a potential aid to those interested in creating biological threats.
Regardless of their limitations, the OpenAI and RAND experiments highlight larger questions which, if addressed head-on, would enable future experiments to provide more valuable and actionable results about AI-related biological threats.
Is there more than statistical significance? In both experiments, third-party evaluators assigned numeric scores to the text-based participant responses. The researchers then evaluated if there was a statistically significant difference between those who had access to chatbots and those who did not. Neither research team found one. But typically, the ability to determine if a statistically significant difference exists largely depends on the number of data points; more data points allow for a smaller difference to be considered statistically significant. Therefore, if the researchers had many more participants, the same differences in score could have been statistically significant.
Reducing text to numbers can bring other challenges as well. In the RAND study, the teams, regardless of access to chatbots, did not generate any plans that were deemed likely to succeed. However, there may have been meaningful differences in why the plans were not likely to succeed, and systematically comparing the content of the responses could prove valuable in identifying mitigation measures.
In the OpenAI work, the goal of the participants was to identify a specific series of steps in a plan. However, if a participant were to miss an early step in the plan, all the remaining steps, even if correct, would not count towards their score. This meant that if someone made an error early on, but identified all the remaining information correctly, they would score similarly to someone who did not identify any correct information. Again, researchers may gain insight from identifying patterns in which steps and why participants failed.
Are the results generalizable? To inform an understanding of the threat landscape, conclusions must be generalizable across scenarios and chatbots. Future evaluators should be clear on which large language models are used (the RAND researchers were not). It would be helpful to understand if researchers achieve a similar answer with different models or different answers with the same model. Knowing the specifics would also enable comparisons of results based on the characteristics of the chatbot used, enabling policymakers to understand if models with certain characteristics have unqiue capabilities and impact.
The OpenAI experiment used just one threat scenario. There is not much reason to believe that this one scenario is representative of all threat scenarios; the results may or may not generalize. There is a tradeoff in using one specific scenario; it becomes tenable for one or two people to evaluate 100 responses. On the other hand, the RAND work was much more open-ended as participant teams were given flexibility in how they decided to achieve their intended goal. This makes the results more generalizable, but required a more extensive evaluation procedure that involved many experts to sufficiently examine 15 diverse scenarios.
Are the results impacted by something else? Part way through their experiment, the RAND researchers enrolled a black cell, a group with significant experience with large language models. The RAND researchers made this decision because they noticed that some of their studys red teams were struggling to bypass safety features of the chatbots. In the end, the black cell received an average score almost double that of the corresponding red teams. The black cell participants didnt need to rely only on their expertise with large language models; they were also adept at interpreting the academic literature about those models. This provided a valuable insight to the RAND researchers, which is [t]herelative outperformance of the black cell illustrates that a greater source of variability appears to be red team composition, as opposed to LLM access. Simply put, it probably matters more who is on the team than if the team has access to a large language model or not.
Moving forward. Despite their limitations, red teaming and benchmarking efforts remain valuable tools for understanding the impact of artificial intelligence on the deliberate biological threat landscape. Indeed, the National Institute of Standards and Technologys Artificial Intelligence Safety Institute Consortiuma part of the US Department of Commercecurrently has working groups focused on developing standards and guidelines for this type of research.
Outside of technical design and execution of the experiments, challenges remain. The work comes with meaningful financial costs including the compensation of participants for their time (OpenAI pays $100 per hour to experts); for indviduals to recruit participants, design experiments, administer the experiments, and analyze data; and of biosecurity experts to evaluate the responses. Therefore, it is important to consider who will fund this type of work in the future. Should artificial intelligence companies fund their own studies, a perceived conflict of interest will linger if the results are intended to be used to inform governance or public perception of their models risks. But at the same time, funding that is directed to nonprofits like RAND Corporation or to academia does not inherently enable researchers access to unreleased or modified models, like the version used in the OpenAI experiment. Future work should learn from these two reports, and could benefit from considering the following:
The path toward more useful research on AI and biological threats is hardly free of obstacles. Employees at the National Institute of Standards and Technology have reportedly expressed outrage regarding the recent appointment of Paul Christianoa former OpenAI researcher who has expressed concerns that AI could pose an existential threat to humanityto a leadership role at the Artificial Intelligence Safety Institute. Employees are concerned that Christianos personal beliefs about catastrophic and extistential risk posed by AI broadly will affect his ability to maintain the National Institute of Standards and Technologys commitment to objectivity.
This internal unrest comes on the heels of reporting that the physical buildings that house the institute are falling apart. As Christiano looks to expand his staff, he will also need to compete against the salaries paid by tech companies. OpenAI, for example, is hiring for safety-related roles with the low end of the base salary exceeding the high end of the general service payscale (federal salaries). It is unlikely that any relief will come from the 2024 federal budget, as lawmakers are expected to decrease the institutes budget from 2023 levels. But if the United States wants to remain a global leader in the development of artificial intelligence, it will need to make financial commitments to ensure that the work required to evaluate artificial intelligence is done right.
See the rest here:
- Bruker Spatial Biology Pushes Boundaries in Spatial Biology and Multiomics at AACR 2025 - Business Wire - April 27th, 2025 [April 27th, 2025]
- UWSP to dedicate Chemistry Biology Building to alumni couple - Point/Plover Metro Wire - April 27th, 2025 [April 27th, 2025]
- Unravelling the Biology of Type 1 Diabetes by Mapping Early Genetics - Inside Precision Medicine - April 27th, 2025 [April 27th, 2025]
- Unlocking High-Throughput Biology in Drug Discovery Symposium - Drug Target Review - April 27th, 2025 [April 27th, 2025]
- Fish, Wildlife, and Conservation Biology student Alex Brown receives prestigious National Science Foundation Graduate Research Fellowship - Colorado... - April 27th, 2025 [April 27th, 2025]
- How engineering biology promotes a sustainable planet - Innovation News Network - April 27th, 2025 [April 27th, 2025]
- Exposing Right-Wing Lies: Biology & Fairness in Sports - Socialist Alternative - April 27th, 2025 [April 27th, 2025]
- The power of RNA-based technologies in molecular biology and medicine - News-Medical - April 27th, 2025 [April 27th, 2025]
- Video: Leafing their mark. IU Columbus biology students host Arbor Day event on campus - The Republic News - April 27th, 2025 [April 27th, 2025]
- Schuylkill County biology teacher wins Volunteer of the Year Award - WNEP - April 27th, 2025 [April 27th, 2025]
- Innovative Spatial Biology Tools - Trend Hunter - April 27th, 2025 [April 27th, 2025]
- ELVIS to enter orbit: Pioneering imaging system to enhance space biology and life detection beyond Earth - Phys.org - April 19th, 2025 [April 19th, 2025]
- While this paleontologist doesn't find Elden Ring's monster designs all that realistic, he was impressed by FromSoft's subtle storytelling and... - April 19th, 2025 [April 19th, 2025]
- CZI Sets Four Scientific Grand Challenges to Transform Human Health at the Intersection of AI and Biology - chanzuckerberg.com - April 19th, 2025 [April 19th, 2025]
- New Article Calls for a Philosophical Revolution in Biology, Placing Mind Over Matter - Evolution News - April 19th, 2025 [April 19th, 2025]
- Opinion Grieving my future in biology - thenorthwindonline.com - April 19th, 2025 [April 19th, 2025]
- U. researcher's paper named top 10 cited in field of addiction biology - The Daily Targum - April 19th, 2025 [April 19th, 2025]
- Future Opportunities In The Synthetic Biology Market Landscape Until 2035 - PharmiWeb.com - April 19th, 2025 [April 19th, 2025]
- "Don't give up hope": Reactions to ruling which states definition of a woman is based on biology - Cosmopolitan - April 19th, 2025 [April 19th, 2025]
- An Exoplanet Discovered With Hints Of Biology? This Is What You Need To Know - IFLScience - April 19th, 2025 [April 19th, 2025]
- Woman Corrected Her Husbands Knowledge Of How Biology Works And His Mom Backed Her Up, So He Stormed Off In Embarrassment - TwistedSifter - April 19th, 2025 [April 19th, 2025]
- MSSU professor's work in environmental, evolutionary biology to be featured on Newsmakers - Inside Joplin - April 19th, 2025 [April 19th, 2025]
- The Increasing Value Of The Synthetic Biology Market By 2035 - PharmiWeb.com - April 19th, 2025 [April 19th, 2025]
- Spring Break at Smithsonians National Zoo and Conservation Biology Institute - National Zoo - April 10th, 2025 [April 10th, 2025]
- Valley biology teacher wins grand prize in National Science Teaching Association competition - TribLIVE.com - April 10th, 2025 [April 10th, 2025]
- Korea passes worlds first synthetic biology law - - April 10th, 2025 [April 10th, 2025]
- Space Biology Research Supports Understanding the Hazards of Human Spaceflight - astrobiology.com - April 10th, 2025 [April 10th, 2025]
- Spatial Biology Reveals Past, Present, and Future Cancer Biology - Genetic Engineering and Biotechnology News - April 10th, 2025 [April 10th, 2025]
- The biology of grafting and its applications in studying information exchange between plants - Nature - April 10th, 2025 [April 10th, 2025]
- How the U.S. Can Seize the Age of Biology featuring Dr. Michelle Rozo - CSIS | Center for Strategic and International Studies - April 10th, 2025 [April 10th, 2025]
- Wildlife Biology at SEMO: Hands-On Training for Conservation - Southeast Missouri State University - April 10th, 2025 [April 10th, 2025]
- Crew Studies Advanced Tech, Space Biology Before Next Crew Departs - NASA (.gov) - April 10th, 2025 [April 10th, 2025]
- Corundum Systems Biology Expands to U.S. with Cambridge Office, Strengthening Ties to Biotech Innovation - Boston Real Estate Times - April 10th, 2025 [April 10th, 2025]
- Wnt signaling pathways in biology and disease: mechanisms and therapeutic advances - Nature - April 10th, 2025 [April 10th, 2025]
- Anthropic provides insights into the AI biology of Claude - AI News - March 30th, 2025 [March 30th, 2025]
- Visiting professor to give joint biology/geology lecture - Fredonia.edu - March 30th, 2025 [March 30th, 2025]
- UNM Biology professor awarded 2025 SDB Early Investigator Award - UNM Newsroom - March 30th, 2025 [March 30th, 2025]
- A map of mitochondrial biology reveals the energy landscape of the human brain - Nature.com - March 30th, 2025 [March 30th, 2025]
- The Outsider | Boebert says her wolf bill defends rural Colorado from leftists and ballot box biology - The Colorado Sun - March 30th, 2025 [March 30th, 2025]
- From hand washing to curing cancer, the AP Biology students are getting answers to all their questions - The Central Trend - March 30th, 2025 [March 30th, 2025]
- North America Synthetic Biology Market Drivers, Segments, Sales, Profits and Analysis- 2028 - openPR - March 30th, 2025 [March 30th, 2025]
- Coding, energy, and biology jobs to survive AI revolution, predicts Bill Gates - Deccan Herald - March 30th, 2025 [March 30th, 2025]
- AI in Biology: So Is This the End of the Experiment? No. - Walter Bradley Center for Natural and Artificial Intelligence - March 15th, 2025 [March 15th, 2025]
- Decoding the secret messages of data, biology and music : TED Radio Hour - NPR - March 15th, 2025 [March 15th, 2025]
- Pivot Bio: How Biology is Starting the Next Agricultural Revolution, Featured at TEDxBoston - PR Newswire - March 15th, 2025 [March 15th, 2025]
- University of Richmond Biology Professor John Peters Receives $500K NSF Award to Support Research on Learning and Memory - UR Now - March 15th, 2025 [March 15th, 2025]
- THE REPRO RUNDOWN | Menstruation Myths, Hormonal Cycles, the Biology Behind the Bleed - Georgetown University The Hoya - March 15th, 2025 [March 15th, 2025]
- Robert Haselkorn, Influential Researcher and Mentor in Molecular Genetics and Cell Biology, 1934-2025 | Newswise - Newswise - March 15th, 2025 [March 15th, 2025]
- AI in Biology: What Difference Did the Rise of the Machines Make? - Walter Bradley Center for Natural and Artificial Intelligence - March 15th, 2025 [March 15th, 2025]
- The Future of Innovation: Exploring the Global Synthetic Biology Market - EIN News - March 15th, 2025 [March 15th, 2025]
- AI in Biology: AI Meets Intrinsically Disordered Proteins - Walter Bradley Center for Natural and Artificial Intelligence - March 15th, 2025 [March 15th, 2025]
- Ask me anything: Artur Ekert 'Nature doesn't know that we divided all phenomena into physics, chemistry and biology' - physicsworld.com - March 5th, 2025 [March 5th, 2025]
- Nvidia Invests Further Into Healthcare And Releases The Largest Biology Foundation Model With The Arc Institute - Forbes - February 25th, 2025 [February 25th, 2025]
- Biology Seminar with Brandon Hedrick, Cornell University, February 28 - Ithaca College - February 25th, 2025 [February 25th, 2025]
- Bruker Spatial Biology to Announce Groundbreaking Advances at AGBT 2025 - Business Wire - February 25th, 2025 [February 25th, 2025]
- Pittsburgh Public Schools working to improve proficiency on Keystone biology exam - Pittsburgh Post-Gazette - February 25th, 2025 [February 25th, 2025]
- Kent Dunlap on the Biology, Evolution and Cultural History of the Neck - KQED - February 25th, 2025 [February 25th, 2025]
- Synthetic Biology Market Expected to Hit USD 186.48 Billion by 2034 with a Remarkable 25.90% CAGR - EIN News - February 25th, 2025 [February 25th, 2025]
- Synthetic Biology in Australia, China, and India: Insights from Asia and Pacific Research Center, Japan Science and Technology Agency - PR Newswire - February 25th, 2025 [February 25th, 2025]
- UT Tyler celebrating evolutionary biology with Darwin Day events - Yahoo! Voices - February 3rd, 2025 [February 3rd, 2025]
- ICDPBB 2025, bringing together global experts to discuss breakthroughs in plant biology and biotechnology - UoH Herald - February 3rd, 2025 [February 3rd, 2025]
- North Park, Northwestern Launch Synthetic Biology Internship Program - North Park University - February 3rd, 2025 [February 3rd, 2025]
- Land O Lakes High biology teacher is Pascos teacher of the year - Tampa Bay Times - February 3rd, 2025 [February 3rd, 2025]
- Hallmark discoveries in the biology of non-Wilms tumour childhood kidney cancers - Nature.com - February 3rd, 2025 [February 3rd, 2025]
- Fast-Forwarding Evolution: AI Mimics 500 Million Years of Biology - SciTechDaily - January 23rd, 2025 [January 23rd, 2025]
- Largest study on the genetics of bipolar disorder to date gives new insights into the underlying biology - Medical Xpress - January 23rd, 2025 [January 23rd, 2025]
- Bornean Orangutan Dies at the Smithsonians National Zoo and Conservation Biology Institute - Smithsonian Institution - January 23rd, 2025 [January 23rd, 2025]
- Trump to define sex as based on biology, affecting federal prisons and shelters - Washington Examiner - January 23rd, 2025 [January 23rd, 2025]
- PhD Candidate, Biology - Brno, Czech Republic job with MASARYK UNIVERSITY | 386867 - Times Higher Education - January 23rd, 2025 [January 23rd, 2025]
- Martinsburg High School biology teacher Renee Haines receives the PAEMST - Martinsburg Journal - January 23rd, 2025 [January 23rd, 2025]
- LanzaTech to spin off LanzaX synthetic biology platform - The Business Journals - January 23rd, 2025 [January 23rd, 2025]
- Molecular Biology Enzymes and Kits & Reagents Industry Outlook, - openPR - January 23rd, 2025 [January 23rd, 2025]
- Studies reveal a new biology of Huntingtons, renewing drugmaker interest in therapies - STAT - January 23rd, 2025 [January 23rd, 2025]
- Rapid action needed to stop the UK falling behind in synthetic biology - Chemistry World - January 23rd, 2025 [January 23rd, 2025]
- Vanessa Carlton Reveals That She and Fellow '00s Icon Julia Stiles Met 30 Years Ago in 9th Grade Biology (Exclusive) - PEOPLE - January 23rd, 2025 [January 23rd, 2025]
- Cyclin-dependent protein kinases and cell cycle regulation in biology and disease - Nature.com - January 15th, 2025 [January 15th, 2025]
- Bioptimus brings its funding to $76M for the GPT of biology - Tech.eu - January 15th, 2025 [January 15th, 2025]
- Influenza A virus in dairy cattle: infection biology and potential mammary gland-targeted vaccines - Nature.com - January 15th, 2025 [January 15th, 2025]
- Giant Pandas Will Make Their Public Debut Jan. 24 at Smithsonians National Zoo and Conservation Biology Institute - Smithsonian's National Zoo and... - January 15th, 2025 [January 15th, 2025]
- Breaking Boundaries in Spatial Biology: Exploring the 2D and 3D Landscape - Genetic Engineering & Biotechnology News - January 15th, 2025 [January 15th, 2025]