- Bits of Brilliance
- Posts
- The Search Problem
The Search Problem
On Vectors and Publishing
If you want to know what’s broken in academic research, you don’t have to look very far. Just try searching for something. Despite all the advances in digital knowledge and the hype around “smarter” search engines, finding what you need often feels like panning for gold in a river of sand. You type in keywords, maybe try some Boolean logic for good measure, and what you get is either a flood of junk or nothing at all.
Why is it still so hard? The root problem is that humans think in concepts, but search engines deal in words. When you search for “adolescent wellbeing,” you might care about studies on cyberbullying, or papers on social media and anxiety—but unless those exact words appear, the engine shrugs. It’s not that the information isn’t out there; it’s that the tools don’t know how to look for it the way you would.
People sometimes imagine this is a new problem, but it’s not. Before digital search, you had card catalogs, indexes, maybe a lucky break while browsing a shelf. Most of my own research fell into the latter category at Alderman Library. These were slow and uneven discovery systems, but at least they were built by people who understood the connections between topics. The first digital search engines threw that away for brute force: just match the strings and see what comes up. This scaled, but it lost all the nuance and most of the context.
Semantic search is supposed to fix this. The idea is to build models that understand meaning, not just words. Instead of treating “social media” and “teen mental health” as separate, it tries to see the connections: online interactions, adolescent psychology, digital wellbeing. In theory, this lets you find papers that matter even if they don’t use your keywords. The dream is a kind of mechanical serendipity—helping you stumble on things you didn’t know you needed.
But there’s a catch. Meaning is slippery. “Digital wellbeing” means one thing in adolescent psychology, something else in workplace studies. Even within a field, people argue over definitions. When you train an algorithm on a giant pile of text, it tends to average out the edges—or worse, absorb the biases in the data. And no matter how fancy your model is, you’re still representing ideas as vectors or nodes. Some detail is always lost, just like Borges’ map that’s the size of the territory but still not the territory.
Even so, better search tools could change a lot. They could help people outside the usual circles find insights they’d otherwise miss. They might connect ideas across disciplines, or uncover outliers that don’t fit the standard narrative. But as algorithms get better at deciding what’s “relevant,” they also get better at filtering what you see—which means they can guide your curiosity as much as serve it. There’s a risk of serendipity turning into predictability, of the machine always leading you down the same well-lit paths.
The truth is, search will never be perfect, because ideas aren’t. The world is messy, and any map we make of it will be an approximation. But that’s fine—maps don’t have to be perfect to be useful. The goal isn’t to build an oracle that knows everything, but to make tools that help us ask better questions, and sometimes, to find the unexpected. If the next generation of search engines can do that—even a little—it will be an improvement. And maybe, next time you go looking for something, you’ll spend less time lost in the haystack.
And it is my hope that search in the AI era not only solves needle in the haystack, but also puts forth the best showing thus far on “needle in the needlestack” problems. What do I mean by that? Well, imagine standing in Times Square at rush hour and trying to pick out one whisper that hops between passing cabs—you’d never manage it by memorizing every voice you’ve ever heard. Or even by understanding what loud traffic and city sounds are. You’d probably be hard pressed to even notice it with all of the background noise overwhelming the scene. But imagine for a moment that your job was to anticipate and transcribe that whisper, and act upon it before its message could reach another passenger. That’s the gist of the level of search granularity and the overall operational capability that we should be going for with these tools.
Reply