October 26, 2023
In biomedical research, staying up to date is more crucial than ever. Every day, between 3000 and 5000 new papers are added to databases like PubMed, flooding the field with fresh findings and observations. Researchers seeking to conduct a thorough literature review or answer specific questions must examine this massive influx of information to ensure their work is informed by the latest and most accurate data. It is imperative for researchers and clinicians to engage with the most recent literature to maintain relevance and accuracy in their own investigations.
AI, especially Large Language Models (LLMs), could theoretically help researchers sift through the massive amount of new information. But there's a problem: LLMs often have old information because retraining them with new data is costly and not often done. For example, the knowledge cutoff for OpenAI's GPT is September 2021 [ref], and the knowledge cutoff for Google’s PaLM 2 (the foundation model behind Med-PaLM 2) is "mid-2021" [ref]. This means that in fast-moving areas like biomedicine, the information LLMs have might not be the most recent.
System offers a unique solution to this need. Our large-scale extraction pipeline runs daily to capture findings from newly published studies, structure them in System's massive graph, and can make them available to an LLM for synthesis based on the user's query. We have a robust series of monitoring and alerting systems in place to make sure that the data is always fresh and our users have the latest information for their work.
To illustrate this problem and our solution, here are two examples.
A multi-centered randomized controlled trial was conducted in the US and UK to compare standard of care (SOC) massive transfusion protocol to SOC massive transfusion protocol plus early use of cryoprecipitate (within 90 minutes of emergency department arrival). The phase 3 clinical trial was completed in Nov 2021, and the peer-reviewed paper of the findings, titled Early and Empirical High-Dose Cryoprecipitate for Hemorrhage After Traumatic Injury, was published in JAMA in Oct 2023.
Let's see what you can find about this intervention on System Pro and compare it with OpenAI's GPT-4.
Search query: High-Dose Cryoprecipitate for Hemorrhage After Traumatic Injury
Prompt: tell me about High-Dose Cryoprecipitate for Hemorrhage After Traumatic Injury
In this example, System, using the extracted findings from the published paper of the trial (reference No.1) correctly tells you that the intervention in question did not have the expected outcome — but GPT-4, although pointing out the gap in its knowledge, implies that high-dose cryoprecipitate therapy could be an effective treatment option.
A global, multicenter, open-label phase 3 trial was conducted to evaluate the efficacy and safety of eplontersen after administration for 65 weeks to patients with hereditary transthyretin-mediated amyloid polyneuropathy (hATTR-PN), as compared to the NEURO-TTR trial. The study started in December 2019 and was concluded in April 2023. The subsequent paper outlining the results, titled Eplontersen for Hereditary Transthyretin Amyloidosis With Polyneuropathy was published in September 2023 in JAMA. Let's compare System Pro's knowledge vs GPT-4 about Eplontersen.
Search: Eplontersen for Hereditary Transthyretin Amyloidosis
Prompt: tell me about Eplontersen for Hereditary Transthyretin Amyloidosis
GPT-4 knows that a trial is ongoing about this topic, but has no knowledge of the outcome. System Pro's synthesis, on the other hand, outlines the details of the finding from the published paper and gives you the exact effect size of the intervention in citation:
(Citation 1) Eplontersen treatment group [vs historical placebo group] is associated with a 70.5 unit decrease in the mean of percentage reduction in serum transthyretin concentration.
AI holds significant promise for keeping researchers and clinicians up to date with the latest research, but its efficacy hinges on the freshness of the knowledge it operates on. The System Graph augments the functionality of Large Language Models (LLMs) by ensuring they are provided with the most current and relevant information. This approach mitigates the issues traditionally associated with outdated knowledge bases, enhancing the accuracy and reliability of the insights derived from these models.
The freshness of knowledge is not only critical in maintaining relevance but also important in reducing hallucination or confabulation in AI systems. Research has shown that LLMs tend to exhibit increased instances of erroneous or fabricated outputs in domains where they have inadequate training data. In their survey of hallucination in Large Language Models, Zhang et al point out that "When asked to answer questions or complete tasks, LLMs often exhibit hallucinations if they lack pertinent knowledge or have internalized false knowledge from the training corpora." [ref]. Therefore, not only are the answers provided by LLMs like ChatGPT to biomedical questions with recent breakthroughs often irrelevant, but also they are more likely to be dangerously misleading or inaccurate.
By prioritizing up-to-date knowledge infusion, System's proprietary methodology addresses these computational inaccuracies, creating a more robust, accurate, reliable, and ultimately useful AI-driven research assistant. Read more about our methodology here.
Filed Under:
Tech