May 4, 2023
For seemingly the first time in the 25 years since Google’s founding, search is ready to be disrupted.
Several forces, all intersecting at this very moment, set the stage for reinvention: the increasing bankruptcy of the canonical “thousand blue links” experience; the emergence of large language models; the critical factual failures of the new chatbots; the continued explosion of data and information; and finally, our awareness of the rising complexity and interdependence of the most important issues we face as individuals, organizations, and society.
In the past few months, we’ve seen an explosion of ideas around how to improve search. I’ll call the dominant search paradigm “The Google Way” and the emerging search paradigm “The ChatGPT Way”.
It’s exciting to be living in a moment where a fundamental building block of the modern information technology stack is ripe for reinvention and improvement, and already it feels inevitable that the new paradigm will be inextricably woven into the fabric of everyday life.
For our purposes, though, I’ll discuss search for professional use, from health to finance, where the accuracy, reliability, and provenance of information matters considerably, and where decision-making is usually expected to be rational and data-driven.
From that standpoint, it’s clear that search The Google Way, which is employed in professional search engines like Web of Science and Scopus, is no longer fit for purpose: helping us find the best, most trustworthy, most complete information to make the best possible decisions. But I would suggest that search The ChatGPT Way isn’t fit for that purpose either, precisely because of what it has in common with the paradigm it’s disrupting.
I’ll organize the problems I see with search into two classes:
These two problems share something in common: they both stem from an overreliance on language in how search engines organize and retrieve or generate information. In their most basic form, The Google Way finds things that match keywords, and The ChatGPT Way predicts the best next word in a sentence. In other words: language is the basis for retrieval in The Google Way, while language is the basis for retrieval as well as the form of the answer in The ChatGPT Way.
But language, in and of itself, has no reliable conception of how the world works, no empirical sense of cause and effect, no understanding of a system. Language is not the same thing as fact. As the computer scientist David Waltz put it to Esther Dyson: "Words are not in themselves carriers of meaning, but merely pointers to shared understandings."
Any search paradigm based primarily on language, then, runs the risk of misrepresenting the world it purports to document. (See Galactica, the LLM for science that Meta released and promptly shut down last year.) How we organize and discover information for professional use — and ultimately, how we make decisions — needs to change.
The volume of information that professionals need to be familiar with continues to grow superlinearly. By one measure, the volume of scientific information now doubles every 17 years, while the amount of medical research doubles every 5 years — but the tools available for navigating that information The Google Way haven’t changed in decades. The physicians we spoke to repeatedly referred to the current paradigm as the “hammer and chisel method” of finding papers; it’s slow and unwieldy, requiring an extraordinary amount of manual labor.
Search engines like Google Scholar and PubMed, then, are not just unable to solve the depth problem, but get worse at it every year as the corpus of research grows. Simply put, there are just too many links to dig through. And by reinforcing biases towards certain types of information sources through pagination and ranking factors, The Google Way discourages the kind of wide-ranging inquiry that is essential to progress.
The ChatGPT Way is equally unsuited to the task of solving for depth, albeit for very different reasons.
LLMs, as Ezra Klein recently wrote, “are built to persuade,” despite the fact that they’re not guided by fidelity to a model of the world as it actually is. “They’re not trained to predict facts,” A.I. ethics researcher Margaret Mitchell told Klein. “They’re essentially trained to make up things that look like facts.” As Bender et al argue, this “ersatz fluency and coherence” heightens the risk of automation bias, which could entail significant real-world harm.
From the standpoint of search in a professional context, The ChatGPT Way should give us serious pause. LLMs hallucinate (or fabricate); they invent citations that don’t exist. And yet they proceed with the confidence and certainty of that one acquaintance who “does their own research,” blithely asserting what philosopher Harry Frankfurt would define as pure bullshit — content with no concern for the truth — as clear-cut statements of fact.
We are living in an era of unimaginable complexity. The climate, our bodies, our cities, the economy: all are systems made up of endlessly interacting, flowing, dependent parts. But while the biggest challenges we face — at both the individual and global scale — are systemic, our data and knowledge are still organized into silos: disciplines, departments, nation-states, organs...
As I wrote when we launched our public beta, this fundamental incongruity makes it nearly impossible to think, plan, and act systemically — to consider the way a system's individual parts interrelate, how systems change over time, and how they work within even larger systems. We can’t navigate our “whitewater world,” to borrow a term from John Seely Brown. We struggle with context and are stifled in our ability to reliably predict outcomes, make decisions, and mitigate risks.
“How do you look at the disturbances of the surface structure,” asked Brown in a 2017 talk at UC Irvine, “and understand something about the deep structure? How do you interpret those flows, or what lies beneath those little ripples? [...] Reading context isn’t that simple. It’s clear that our ability to read context is seriously impaired.”
Search is good at helping us find massive amounts of information about trees, but terrible at showing us the forest.
Search as we know it reinforces silos. Limited by its strictly semantic backbone, it ossifies the fragmentation of knowledge that we have encoded over time in language, failing to help us consider context. Put another way: search is good at helping us find massive amounts of information about trees, but terrible at showing us the forest. It does nothing to help us see how the parts of systems relate, to help us understand the breadth and complexity of the world as it actually is — and to anticipate what might happen next.
Search The Google Way or The ChatGPT Way doesn’t naturally help us consider questions like: What else do I not know to search for? What else could be causing this? What could happen if this happens? What other factors should I take into account? Instead, it strictly limits the scope of inquiry by retrieving (and producing) language that corresponds to our starting point.
As a direct result, search today is technically unable to help us make the best possible decisions. It may even compound our challenges.
We believe there’s a better way.
In 2018, we invented and patented a new architecture to organize information based primarily on statistical relationships rather than semantic ones.
You can read more here, but in brief:
Today, just over a year since we launched System in public beta, we are excited to make two major announcements.
First, we have extracted the statistical results from all original studies on PubMed using AI. To the best of our knowledge, this is the first time statistical information retrieval has been achieved at this scale. This unique structured data is helping to relate tens of thousands of topics across health and biomedical research on System.
Second, we are announcing the first application built on top of System: System Pro.
Purpose-built with explainability and trustworthiness at its core.
Powered by AI, System Pro reinvents search for research. Unlike other search engines and chatbots, System Pro is purpose-built with explainability and trustworthiness at its core, combining LLMs with unique structured data and a patented graph architecture. It’s designed to help break down disciplinary silos and support systems-based research and solutions.
It’s the fastest and most reliable way of finding, synthesizing, and contextualizing scientific literature — starting in health and life sciences and then expanding to all domains of knowledge. We built it for professionals who depend on accurate, trustworthy, and up-to-date information to make decisions. You can sign up for a free trial starting today.
As is our practice, we have also published the Release Risks associated with today’s release.
What if you could get reliable, up to date, and transparently sourced answers to your searches? That could result in massive time savings.
And what if it were just as easy to contextualize something as it is to find content about it? That would meaningfully improve decision-making, making it more rational and systemic, and therefore more accurate and reliable. It would reduce unintended consequences, lead to more efficient and reliable interventions and better decisions, and empower decision-makers to place bigger bets with greater confidence.
Here’s how System Pro is bringing this vision to life.
What if it were just as easy to contextualize something as it is to find content about it?
Based on your search, System Pro assembles all the relevant statistical findings we’ve extracted with AI from peer-reviewed studies. We cluster those statistical findings to group findings about the same or similar variables, and convert them into natural language following strict rules. Finally, we prompt an LLM to synthesize those clustered findings, using and citing the findings alone. The end result is an accurate, trustworthy, and up-to-date overview of the relevant research. You can also quickly pinpoint where there is agreement and disagreement.
For the first time, all sources used to generate the synthesis are clearly cited and linked back to the original source, and you can filter the synthesis to make it more relevant to your needs.
All of this is only possible because of System’s patented architecture. We combine LLMs with unique structured data, aligning our synthesis with a model of the world as it actually is.
Syntheses are based on all the information on System, which includes the results of all original research articles on PubMed. We are continuing to expand our source material, first within health and life sciences and then beyond to all domains of knowledge. We’re deliberately taking our time to get each knowledge space right before we move on. We think accuracy is just too important to our users and their applications.
Based on your search, System Pro recommends statistically related topics that you may want to consider to contextualize your search — both upstream and downstream from where you started. This is a brand new kind of recommendation that’s only possible because of System’s statistical architecture — we’re excited to hear your feedback and develop it further.
System Pro also assembles a systems map of your search that shows all the other topics that are statistically related (if you’ve tried System’s public beta, this view will be familiar to you). You’ll discover new connections to help you strengthen research plans and clinical decisions. A systemic view, not a siloed one.
System Pro marks a new approach to search for professional use: easier than The Google Way; more transparent, reliable and explainable than The ChatGPT Way; built for discovery, not just search — for the complexity and interconnectedness of the world as it actually is.
A few weeks ago, the authors of the “Stochastic Parrots” paper (cited above) — Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell — issued a statement in response to a widely read Future of Life Institute letter calling for an immediate pause on “giant AI experiments.”
“We should be building machines that work for us, instead of ‘adapting’ society to be machine readable and writable.”
Machines that work for us. A vision of technological progress that centers human agency. One that resonates with our own vision, here at System, of what search can and should be.
The ChatGPT Way reduces scientific endeavor to the stuff of “unfathomable training data” for LLMs, abstracting away both the existence of the people who did the research that an answer is based on and the context (and possible bias) of that research. On System Pro, you see the authors and the populations they studied. It’s not just about transparency of sources for the sake of trust, which I think is critical, but a recognition of how knowledge advances: bit by bit, standing on the shoulders of the people who came before.
System Pro was built to amplify human work. Instead of using AI as an end in itself, we’re using it as a tool to link together, for the first time at this scale and breadth, the work of researchers the world over. To show it all in context and make it more than the sum of its parts.
Filed Under:
Announcements