Key Points
- A new AI system can produce synthesized knowledge; FutureHouse’s AI tool created accurate Wikipedia-style summaries for 17,000 human genes.
- AI tools help with narrative reviews but fall short of the rigor needed for systematic reviews.
- Specialized AI tools aid in research synthesis but are limited to open-access sources. Challenges in full-text access AI systems struggle with high costs.
- AI-driven reviews risk reducing rigor, potentially impacting literature quality. Non-profit-backed AI systems could balance quality and accessibility.
When neurobiologist Sam Rodriques was a graduate student, he observed a scientific limitation: even if all knowledge to understand complex systems like cells or the brain was available, humans might struggle to synthesize it comprehensively. Five years later, Rodriques and his team at FutureHouse, a U.S. AI start-up, may have a solution. In September, they announced a new AI system capable of producing synthesized knowledge about scientific topics more accurately than traditional Wikipedia entries. Within minutes, they generated summaries for approximately 17,000 human genes, most lacking prior detailed entries.
Rodriques isn’t alone in turning to AI for scientific synthesis. Scholars have been working to streamline the synthesis of scientific literature for decades, often finding traditional review processes lengthy, labor-intensive, and outdated upon completion. The rise of large language models (LLMs) like ChatGPT is sparking fresh excitement for accelerating research review processes.
While AI-powered science search engines now offer substantial aid in narrative literature reviews by finding, sorting, and summarizing publications, they fall short of fully automating systematic reviews. These reviews require strict procedural standards for rigor and reliability, making complete automation challenging. Paul Glasziou, an expert in evidence-based reviews, believes AI can aid in parts of the process. However, it may take decades for these tools to handle systematic reviews independently.
AI-powered systems like FutureHouse’s PaperQA2 go further, attempting full-text searches across academic databases. This approach has produced promising results: a panel of scientists found fewer reasoning errors in AI-generated gene summaries than in human-written Wikipedia entries. However, as Rodriques acknowledges, comprehensive searches are computationally demanding and expensive.
More specialized tools, like Consensus and Elicit, are helping researchers conduct systematic reviews by identifying relevant studies and extracting insights. However, these systems are limited by access to open-access sources, leaving much of the paywalled scientific literature untouched. While these tools provide useful summaries, experts agree they lack the rigor for systematic reviews.
AI’s expansion into scientific synthesis also raises concerns about accuracy and reliability. As James Thomas of University College London warns, hasty AI-driven reviews could jeopardize the quality of scientific literature. Still, supporters believe AI could enhance quality by flagging low-quality studies and speeding up rigorous reviews. Rodriques and colleagues are optimistic that non-profit-backed AI systems could democratize scientific knowledge while maintaining high standards, as exemplified by recent UK funding into AI-based synthesis tools.