Over the last few months, experts and lawmakers have become increasingly concerned that advances in artificial intelligence could help bad actors develop biological threats. But so far there have been no reported biological misuse examples involving AI or the AI-driven chatbots that have recently filled news headlines. This lack of real-world wrongdoing prevents direct evaluation of the changing threat landscape at the intersection of AI and biology.
Nonetheless, researchers have conducted experiments that aim to evaluate sub-components of biological threatssuch as the ability to develop a plan for or obtain information that could enable misuse. Two recent effortsby RAND Corporation and OpenAIto understand how artificial intelligence could lower barriers to the development of biological weapons concluded that access to a large language model chatbot did not give users an edge in developing plans to misuse biology. But those findings are just one part of the story and should not be considered conclusive.
In any experimental research, study design influences results. Even if technically executed to perfection, all studies have limitations, and both reports dutifully acknowledge theirs. But given the extent of the limitations in the two recent experiments, the reports on them should be seen less as definitive insights and more as opportunities to shape future research, so policymakers and regulators can apply it to help identify and reduce potential risks of AI-driven misuse of biology.
The limitations of recent studies. In the RAND Corporation report, researchers detailed the use of red teaming to understand the impact of chatbots on the ability to develop a plan of biological misuse. The RAND researchers recruited 15 groups of three people to act as red team bad guys. Each of these groups was asked to come up with a plan to achieve one of four nefarious outcomes (vignettes) using biology. All groups were allowed to access the internet. For each of the four vignettes, one red team was given access to an unspecified chatbot and another red team was given access to a different, also unspecified chatbot. When the authors published their final report and accompanying press release in January, they concluded that large language models do not increase the risk of a biological weapons attack by a non-state actor.
This conclusion may be an overstatement of their results, as their focus was specifically on the ability to generate a plan for biological misuse.
The other report was posted by the developers of ChatGPT, OpenAI. Instead of using small groups, OpenAI researchers had participants work individually to identify key pieces of information needed to carry out a specific defined scenario of biological misuse. The OpenAI team reached a conclusion similar to the RAND teams: GPT-4 provides at most a mild uplift in biological threat creation accuracy. Like RAND, this also may be an overstatement of results as the experiment evaluated the ability to access information, not actually create a biological threat.
The OpenAI report was met with mixed reactions, including skepticism and public critique regarding the statistical analysis performed. The core objection was the appropriateness of the use of a correction during analysis that re-defined what constituted a statistically significant result. Without the correction, the results would have been statistically significantthats to say, the use of the chatbot would have been judged to be a potential aid to those interested in creating biological threats.
Regardless of their limitations, the OpenAI and RAND experiments highlight larger questions which, if addressed head-on, would enable future experiments to provide more valuable and actionable results about AI-related biological threats.
Is there more than statistical significance? In both experiments, third-party evaluators assigned numeric scores to the text-based participant responses. The researchers then evaluated if there was a statistically significant difference between those who had access to chatbots and those who did not. Neither research team found one. But typically, the ability to determine if a statistically significant difference exists largely depends on the number of data points; more data points allow for a smaller difference to be considered statistically significant. Therefore, if the researchers had many more participants, the same differences in score could have been statistically significant.
Reducing text to numbers can bring other challenges as well. In the RAND study, the teams, regardless of access to chatbots, did not generate any plans that were deemed likely to succeed. However, there may have been meaningful differences in why the plans were not likely to succeed, and systematically comparing the content of the responses could prove valuable in identifying mitigation measures.
In the OpenAI work, the goal of the participants was to identify a specific series of steps in a plan. However, if a participant were to miss an early step in the plan, all the remaining steps, even if correct, would not count towards their score. This meant that if someone made an error early on, but identified all the remaining information correctly, they would score similarly to someone who did not identify any correct information. Again, researchers may gain insight from identifying patterns in which steps and why participants failed.
Are the results generalizable? To inform an understanding of the threat landscape, conclusions must be generalizable across scenarios and chatbots. Future evaluators should be clear on which large language models are used (the RAND researchers were not). It would be helpful to understand if researchers achieve a similar answer with different models or different answers with the same model. Knowing the specifics would also enable comparisons of results based on the characteristics of the chatbot used, enabling policymakers to understand if models with certain characteristics have unqiue capabilities and impact.
The OpenAI experiment used just one threat scenario. There is not much reason to believe that this one scenario is representative of all threat scenarios; the results may or may not generalize. There is a tradeoff in using one specific scenario; it becomes tenable for one or two people to evaluate 100 responses. On the other hand, the RAND work was much more open-ended as participant teams were given flexibility in how they decided to achieve their intended goal. This makes the results more generalizable, but required a more extensive evaluation procedure that involved many experts to sufficiently examine 15 diverse scenarios.
Are the results impacted by something else? Part way through their experiment, the RAND researchers enrolled a black cell, a group with significant experience with large language models. The RAND researchers made this decision because they noticed that some of their studys red teams were struggling to bypass safety features of the chatbots. In the end, the black cell received an average score almost double that of the corresponding red teams. The black cell participants didnt need to rely only on their expertise with large language models; they were also adept at interpreting the academic literature about those models. This provided a valuable insight to the RAND researchers, which is [t]herelative outperformance of the black cell illustrates that a greater source of variability appears to be red team composition, as opposed to LLM access. Simply put, it probably matters more who is on the team than if the team has access to a large language model or not.
Moving forward. Despite their limitations, red teaming and benchmarking efforts remain valuable tools for understanding the impact of artificial intelligence on the deliberate biological threat landscape. Indeed, the National Institute of Standards and Technologys Artificial Intelligence Safety Institute Consortiuma part of the US Department of Commercecurrently has working groups focused on developing standards and guidelines for this type of research.
Outside of technical design and execution of the experiments, challenges remain. The work comes with meaningful financial costs including the compensation of participants for their time (OpenAI pays $100 per hour to experts); for indviduals to recruit participants, design experiments, administer the experiments, and analyze data; and of biosecurity experts to evaluate the responses. Therefore, it is important to consider who will fund this type of work in the future. Should artificial intelligence companies fund their own studies, a perceived conflict of interest will linger if the results are intended to be used to inform governance or public perception of their models risks. But at the same time, funding that is directed to nonprofits like RAND Corporation or to academia does not inherently enable researchers access to unreleased or modified models, like the version used in the OpenAI experiment. Future work should learn from these two reports, and could benefit from considering the following:
The path toward more useful research on AI and biological threats is hardly free of obstacles. Employees at the National Institute of Standards and Technology have reportedly expressed outrage regarding the recent appointment of Paul Christianoa former OpenAI researcher who has expressed concerns that AI could pose an existential threat to humanityto a leadership role at the Artificial Intelligence Safety Institute. Employees are concerned that Christianos personal beliefs about catastrophic and extistential risk posed by AI broadly will affect his ability to maintain the National Institute of Standards and Technologys commitment to objectivity.
This internal unrest comes on the heels of reporting that the physical buildings that house the institute are falling apart. As Christiano looks to expand his staff, he will also need to compete against the salaries paid by tech companies. OpenAI, for example, is hiring for safety-related roles with the low end of the base salary exceeding the high end of the general service payscale (federal salaries). It is unlikely that any relief will come from the 2024 federal budget, as lawmakers are expected to decrease the institutes budget from 2023 levels. But if the United States wants to remain a global leader in the development of artificial intelligence, it will need to make financial commitments to ensure that the work required to evaluate artificial intelligence is done right.
See the rest here:
- AACR 2024 Plenary: New Insights Into Early Cancer Biology - American Association for Cancer Research (AACR) - April 15th, 2024 [April 15th, 2024]
- Students create 3 D skulls in project funded by Costello award connecting biology, computer science and art | Fredonia ... - Fredonia.edu - April 15th, 2024 [April 15th, 2024]
- Seed Health launches computational biology platform to expand pipeline beyond the gut - NutraIngredients-usa.com - April 15th, 2024 [April 15th, 2024]
- Analytics for Spatial Biology: DNA and RNA Imaging - LCGC Chromatography Online - April 15th, 2024 [April 15th, 2024]
- Here's what I teach my students about finding jobs in marine biology and conservation - Southern Fried Science - April 15th, 2024 [April 15th, 2024]
- Biology, not technology, will save the planet | Vashon-Maury Island Beachcomber - Vashon-Maury Island Beachcomber - April 15th, 2024 [April 15th, 2024]
- Bonobos are more aggressive than previously thought - EurekAlert - April 15th, 2024 [April 15th, 2024]
- Debating sex and gender: Whose 'biological reality' is it anyway? - The Boston Globe - April 15th, 2024 [April 15th, 2024]
- Why detecting the earliest biological signs of Parkinson's disease is so crucial - Scope - April 15th, 2024 [April 15th, 2024]
- Cicada experts and resources available from UWMadison - University of Wisconsin-Madison - April 15th, 2024 [April 15th, 2024]
- Iconic savanna mammals face genetic problems due to fences and roads - EurekAlert - April 15th, 2024 [April 15th, 2024]
- Advice to a Young Mathematical Biologist - University of Birmingham - April 15th, 2024 [April 15th, 2024]
- New project explores warfare in animal societies - EurekAlert - April 15th, 2024 [April 15th, 2024]
- Unheard of in Structural Biology New Research Unveils Enzymatic Keys to Immune System Regulation - SciTechDaily - April 7th, 2024 [April 7th, 2024]
- Pygmy Slow Lorises Are Born at Smithsonian's National Zoo and Conservation Biology Institute - Smithsonian's National Zoo and Conservation Biology... - April 7th, 2024 [April 7th, 2024]
- VespAI: a deep learning-based system for the detection of invasive hornets | Communications Biology - Nature.com - April 7th, 2024 [April 7th, 2024]
- Inside the new Seattle Hub for Synthetic Biology, which uses DNA to 'record biology over time' - GeekWire - April 7th, 2024 [April 7th, 2024]
- The Biology of Kindness review: Living well and prospering - New Scientist - April 7th, 2024 [April 7th, 2024]
- UND biologist explains composition of dirt to kids - UND Today - UND Blogs and E-Newsletters - April 7th, 2024 [April 7th, 2024]
- In the evolution of walking, the hip bone connected to the rib bones - EurekAlert - April 7th, 2024 [April 7th, 2024]
- Synaptic ribbon dynamics after noise exposure in the hearing cochlea | Communications Biology - Nature.com - April 7th, 2024 [April 7th, 2024]
- Biologist Beth Shapiro on the 'de-extinction' of woolly mammoths - STAT - April 7th, 2024 [April 7th, 2024]
- MCC students inducted into new honor society geared toward biology - Mohave Valley News - April 7th, 2024 [April 7th, 2024]
- FRIDAY FACULTY FEATURE: Boente's Biology Background the Rock Online - The Rock Online - April 7th, 2024 [April 7th, 2024]
- Measuring the Intelligence of a Cell - University of California San Diego - April 7th, 2024 [April 7th, 2024]
- Shy sea anemones are more likely to survive heatwaves - EurekAlert - April 7th, 2024 [April 7th, 2024]
- Akoya Biosciences Showcases Spatial Biology 2.0 Solutions at AACR Annual Meeting with Case Studies ... - GlobeNewswire - April 7th, 2024 [April 7th, 2024]
- BD Increases Access to Cutting-Edge Image-Enabled, Spectral Cell Sorters - BioSpace - April 7th, 2024 [April 7th, 2024]
- We've had bird evolution all wrong - EurekAlert - April 7th, 2024 [April 7th, 2024]
- Altered brain morphology and functional connectivity in postmenopausal women - EurekAlert - April 7th, 2024 [April 7th, 2024]
- BPGbio Presents Key Advances in its Oncology Pipeline related to the NAi Interrogative Biology Platform - StreetInsider.com - April 7th, 2024 [April 7th, 2024]
- Research Technician in the Center for Genomics and Systems Biology, Dr. Kristin Gunsalus job with NEW YORK ... - Times Higher Education - April 7th, 2024 [April 7th, 2024]
- The Road to Biology 2.0 Will Pass Through Black-Box Data - Towards Data Science - March 19th, 2024 [March 19th, 2024]
- Nobel-winning biologist on the most promising ways to stop ageing - New Scientist - March 19th, 2024 [March 19th, 2024]
- Fired Biology Professor Fights Back and Wins, Has a Message For Fellow Christians - CBN.com - March 19th, 2024 [March 19th, 2024]
- Understanding Reductionism and ID - Discovery Institute - March 19th, 2024 [March 19th, 2024]
- All creatures great and small: Sequencing the blue whale and Etruscan shrew genomes - University of Wisconsin-Madison - March 19th, 2024 [March 19th, 2024]
- Seeing Double: USU Biologist Carl Rothfels is Developing Novel Polyploid Phylogenetics Tools - Utah State University - March 19th, 2024 [March 19th, 2024]
- Department of Biology Special Seminar: Angela Hancock - The Hub at Johns Hopkins - March 19th, 2024 [March 19th, 2024]
- New Imaging Tool Advances Study of Lipid Biology - University of California San Diego - March 19th, 2024 [March 19th, 2024]
- Saving Biology With Blue Biotechnology - The Maritime Executive - March 19th, 2024 [March 19th, 2024]
- The Human Element: For Student Scientists, Learning to Place Biology in Social Context - Tufts Now - March 19th, 2024 [March 19th, 2024]
- A theoretical framework to improve the adoption of green Integrated Pest Management tactics | Communications Biology - Nature.com - March 19th, 2024 [March 19th, 2024]
- Advancing the scale of synthetic biology via cross-species transfer of cellular functions enabled by iModulon engraftment - Nature.com - March 19th, 2024 [March 19th, 2024]
- A.I. Is Learning What It Means to Be Alive - The New York Times - March 19th, 2024 [March 19th, 2024]
- This Towson University senior found her purpose in biology - Technical.ly - March 19th, 2024 [March 19th, 2024]
- W&M Experiential Courses Bring Biology to Life and Transform the Student Experience - WYDaily - March 19th, 2024 [March 19th, 2024]
- The hidden value of herbariums | On Point - WBUR News - March 19th, 2024 [March 19th, 2024]
- Professor of Biology/Zoology/Physiology (Tenure-Track) job with BLACKBURN COLLEGE | 37620424 - The Chronicle of Higher Education - March 19th, 2024 [March 19th, 2024]
- Generative AI in Biology Market Poised for Remarkable Growth, to Surpass USD 346.9 Billion by 2032, - PharmiWeb.com - March 19th, 2024 [March 19th, 2024]
- Advancements in Understanding the Immune System Biology: Research Contributions - Medriva - February 13th, 2024 [February 13th, 2024]
- Developing Cancer Therapies at the Intersection of Chemistry and Biology - American Association for Cancer Research (AACR) - February 13th, 2024 [February 13th, 2024]
- Recent Advances in the Understanding of MCL Biology - Cancer Network - February 13th, 2024 [February 13th, 2024]
- Biological Mechanism of Noise-Induced Hearing Loss Discovered - The Hearing Review - February 13th, 2024 [February 13th, 2024]
- Genomic attributes of airway commensal bacteria and mucosa | Communications Biology - Nature.com - February 13th, 2024 [February 13th, 2024]
- UCLA Receives $4.6 Million Grant from The Warren Alpert Foundation to Launch Computational Biology/AI Training ... - UCLA Health Connect - February 13th, 2024 [February 13th, 2024]
- Former Zookeeper Hopes to Share Passion for Biology as a Science Educator - Georgia State University News - February 13th, 2024 [February 13th, 2024]
- Uses BgRT Radiation Therapy to Target Tumor - City of Hope - February 13th, 2024 [February 13th, 2024]
- Renowned evolutionary biologist to speak for SFA's Darwin Day event | SFA - Stephen F. Austin State University - February 13th, 2024 [February 13th, 2024]
- State biologists want you to send them owl vomit - Bangor Daily News - February 13th, 2024 [February 13th, 2024]
- Dan Bush named a pioneer member for the American Society of Plant Biology - College of Natural Sciences - Colorado State University - February 13th, 2024 [February 13th, 2024]
- Real-time simultaneous refractive index and thickness mapping of sub-cellular biology at the diffraction limit ... - Nature.com - February 13th, 2024 [February 13th, 2024]
- Letter: Unraveling the threat of misinformation in ballot biology and mountain lion hunting - Vail Daily - February 13th, 2024 [February 13th, 2024]
- Clownfish: Studying their Complex Lives and Anemone Homes | The Brink - Boston University - February 13th, 2024 [February 13th, 2024]
- Rucaparib and its major metabolite exhibit differential biological activity and synergy - News-Medical.Net - February 13th, 2024 [February 13th, 2024]
- Beyond Nature's Limits: Ethical Dilemmas in the Age of Synthetic Biology - Medium - February 13th, 2024 [February 13th, 2024]
- Biological Research And Self-driving Labs In Deep Space Supported By Artificial Intelligence - Astrobiology - Astrobiology News - February 13th, 2024 [February 13th, 2024]
- Surprising behavior in one of the least studied mammals in the world - EurekAlert - February 13th, 2024 [February 13th, 2024]
- Exploring the Role of Non-Protein Ubiquitination in Cellular Biology - Medriva - February 13th, 2024 [February 13th, 2024]
- Researchers call for antitrust measures to safeguard innovation in spatial biology - Phys.org - February 13th, 2024 [February 13th, 2024]
- Lost Loves: USU Neuroscientists Learn About Grief From Widowed Coyotes - Utah State University - February 13th, 2024 [February 13th, 2024]
- Unraveling the Complexities of Multiple Sclerosis: A New Approach to Understanding Chronic Diseases - Medriva - February 13th, 2024 [February 13th, 2024]
- Synthetic biology aims to tackle disease and give cells superpowers - Science News Explores - January 19th, 2024 [January 19th, 2024]
- Ecological determinants of Cope's rule and its inverse | Communications Biology - Nature.com - January 19th, 2024 [January 19th, 2024]
- New Evolution Theory Explains Why Animals Shrink Over Time - SciTechDaily - January 19th, 2024 [January 19th, 2024]
- Biology major finds UWO perfect for college education and first professional job - UW Oshkosh Today - January 19th, 2024 [January 19th, 2024]
- Gustavus BMB Major Earns Prestigious Accreditation - The American Society of Biochemistry and Molecular Biology ... - Gustavus Adolphus College - January 19th, 2024 [January 19th, 2024]
- SARS-CoV-2 biology and host interactions - Nature.com - January 19th, 2024 [January 19th, 2024]
- Biology faculty member rethinks office hours with student needs first - University Times - January 19th, 2024 [January 19th, 2024]
- Penn State Altoona biology student to offer research presentation - Pennsylvania State University - January 19th, 2024 [January 19th, 2024]