November 10, 2023
Today, we publish the findings of a comparative analysis of System and OpenAI's GPT-4, specifically concerning the quality of the biomedical information generated.
Our results show that System's synthesis — an experiment to generate research syntheses exclusively from the System Graph — surpasses GPT-4 in delivering highly accurate and comprehensive information. While GPT-4 currently offers greater clarity, which can be beneficial for quick comprehension, the slight compromise in this area by System is a strategic trade-off to achieve the high level of accuracy and comprehensiveness our users require to make decisions in health and life sciences. Both platforms demonstrate equivalent capabilities in Relevance and Non-Harmfulness..
We previously demonstrated that System's synthesis is also uniquely architected to reflect the very latest research findings, as compared to OpenAI's GPT which has a knowledge cutoff in September 2021 [ref].
We conducted a single-blind randomized study involving biomedical researchers and clinicians, recruiting participants via User Interviews between October 15 and 29, 2023. Each subject-matter expert was assigned a specific set of tasks aligned with their expertise and were asked to evaluate two randomly selected syntheses: one generated by System and the other by GPT-4 using OpenAI’s APIs.
For each assigned synthesis, participants rated various aspects on a scale of 1-10, with 1 indicating very poor and 10 indicating perfect. The Harmfulness rating scale was reversed.
Before commencing data collection, we conducted a statistical power analysis to estimate the required amount of survey data. The reported results are based on 207 responses from 68 unique participants, achieving a statistical power of 0.86.
Filed Under:
Tech