GPUs Run the World

More cooks in the AI kitchen, please

It’s odd when you think about it: the chips that make video games look good are now powering artificial intelligence. You’d expect the future of “thinking machines” to depend on some mysterious, brain-like hardware. Instead, it’s the same silicon that draws explosions and dragon scales that’s running chatbots and generating images. Why?

The answer is parallelism. But to see why, you have to understand the difference between CPUs and GPUs. Most people think CPUs are just “faster” than GPUs, or vice versa, but that’s not it. The difference is more like the difference between a chef and a hundred line cooks. A CPU is a chef—good at complicated recipes, but only able to make a few things at once. A GPU is a kitchen full of line cooks: each one can only chop or fry, but together they can crank out a thousand appetizers at the same time.

You might think artificial intelligence would be a chef’s job—subtle, intricate, one-of-a-kind. But modern AI, at least the kind that works, is more like bruschetta than soufflé. It’s mostly lots of the same thing, repeated over and over. The line-cook-style recipe can be dismantled into parts that run side-by-side, instead of the chef’s approach, which combines ingredients through a series of transformations that must happen in a certain sequence and cannot be reversed or scaled up or down very easily.

Why? Because most of what’s called “deep learning” is just multiplying giant grids of numbers (i.e., linear algebra). And multiplying matrices is the kind of job where you can give each cook a little piece and let them all work at once. This is exactly what GPUs are good at. They were designed to crunch through huge piles of simple math, because that’s what graphics rendering is. It just happens that this is also what neural networks need.

For a long time, no one saw this coming. AI used to mean rules and logic—things a CPU chef could whip up in small batches. It was only when people started training huge neural networks on mountains of data that GPUs became useful for something besides games. The alignment was accidental, but it changed everything.

So maybe the current era of “generative AI” isn’t as complicated as we thought. At least, not the kind we have now. The recent breakthroughs aren’t because someone invented a new algorithm, but because we finally had hardware that let us throw enough brute force at the problem. Progress came from scale, not cleverness.

Of course, the story isn’t over. Now that generative AI is big business, people are designing chips specifically for it. These new “AI accelerators” are even better at matrix math than gaming GPUs. Maybe someday, someone will invent a new kind of AI that needs a different kind of hardware—one that’s good at reasoning, not just pattern-matching. But for now, parallelism wins.

If you’re working with AI in the enterprise, the lesson is simple: know your workload. For small experiments, a CPU is fine. For anything big, you’ll want a GPU, or a lot of them. And all this is to say that my new preferred term of the AI tinkerer’s garage is the “AI kitchen” or the place where people are free to make a mess and learn how to go from chef-like artisanal task execution to catering-style line cook execution.

Pondering a little more, network effects and scaling are so weird. It is almost as if the massive scale of networks and networked work is by itself enough to appear as if it were magic. And this despite having the simplest of ingredients.

Reply

or to participate.