Bits of Brilliance
Posts
Making Newspaper Images Searchable

Making Newspaper Images Searchable

Dylan Goldblatt
February 21, 2025

For a long time, the problem with historical archives wasn’t that information was missing. It was that most of it was right there, but you couldn’t get at it. Old newspapers were digitized by the millions, but only the text was really accessible. OCR made it possible to search for words, but everything visual—photographs, cartoons, ads—was just a blur in a scanned image. If you wanted to study how politicians were depicted, or how products were advertised, you had to scroll through page after page, hoping to spot something useful. It was like trying to do astronomy with the naked eye: technically possible, but mostly futile.

This is what makes projects like Newspaper Navigator so interesting. They don’t just make it easier to find images in old newspapers. They turn an entire hidden world into data. By using computer vision to extract and tag millions of images, they make the visual past searchable in the same way text has been for decades. Suddenly, you can ask questions that would have been impossible before. How did the portrayal of women in ads change from the 1920s to the 1960s? What did political cartoons focus on during economic downturns? You don’t have to guess or settle for anecdotes. You can actually look.

It’s tempting to see this as just another tool for historians. But it’s more than that. It changes the kinds of questions you can ask. Before, you could only study what you could find. Now, you can see patterns across decades and thousands of publications. You can notice things no one would have thought to look for. In the process, the definition of historical research expands. It’s not just faster. It’s qualitatively different.

Text mining has gotten all the attention in the digital humanities, but images might be more revealing. Text tells you what people said, or at least what they wrote. Images show you what they noticed, what they valued, what they feared. The margins of old newspapers are full of things that never made it into the editorials: the way a politician is caricatured, the products that advertisers thought would appeal to the public, the faces in crowd shots. These are the details that make the past feel real, and until now they were mostly inaccessible at scale.

Some people worry that machines will never understand the nuances in these images—the irony in a cartoon, or the subtle codes in an ad. That’s true, for now. But that’s not the point. The power of these tools is that they can scan everything and tell you where to look closer. They don’t replace human judgment; they focus it. It’s like having a map of a new country. You still have to explore, but now you know where the mountains and rivers are.

The bigger shift is in how we think about history itself. Because text was easier to search, we’ve acted as if the past was mostly made of words. But human experience is visual. By making images searchable, we’re unlocking a record that was always there, just inaccessible. And because images cross language and literacy barriers, they make history available to more people. You don’t need to read an editorial to know how a community saw itself; sometimes a photo or a cartoon tells you more.

Of course, there are new risks. It will be tempting to focus only on what’s easy to count or tag. We might miss things that don’t fit neatly into categories, or misinterpret what we find. And there are ethical questions about whose images get surfaced and how they’re used. But these are problems of abundance, not scarcity. They’re better than not knowing what’s there at all.

The real lesson is that archives aren’t just boxes of facts waiting to be read. They’re layers of culture waiting to be seen. By making images as accessible as text, we get a richer, messier, and more honest view of the past. Instead of just reading history, we can finally see it.

Reply

or to participate.