DeepSeek, Soviet Software, and Human Nature
Perhaps a different explanation of why this happened, and what to take away from it
So you’ve heard that China’s DeepSeek developed a world class AI on the cheap while using a fraction of the computing power that big U.S. companies rely on, and maybe you’re wondering:
How is that possible?
There are, of course, a bunch of technical explanations. But there’s also a more profound one: human nature.
This is something I learned while writing about Russian software startups in the late-1980s, yet it’s a part of today’s story that U.S. tech leaders and politicians don’t seem to be talking about…but should.
As the Soviet Union broke up at the end of the 1980s, I traveled there often to write about the first capitalistic private businesses sprouting up. By then I’d been covering the U.S. tech industry for a few years, and was particularly interested in finding fledgling Russian software companies.
There were quite a few, and, surprisingly, some of them were solving problems that had been stumping U.S. companies, and doing it with minimal computing power. For example, a Moscow startup called ParaGraph built software that could recognize handwriting much better than anything ginned up in the U.S. When Apple built its Newton hand-held computer (introduced in 1993), Apple contracted with Paragraph because it couldn’t find anything better here.
In my reporting back then, I asked a lot of Russians about this phenomenon. For decades, the West had sealed off the Soviet Union from acquiring the latest computer hardware. Russian programmers, who until the 1990s all worked for state enterprises or the military, told me they had to do their work on clunky, generations-old machines. Yet at the same time, Soviet leaders pressed them to keep up with Western software. Competition with the U.S. was intense, and the Soviets didn’t want to lose.
That’s where human nature came into play. What would you do if you were required to compete with someone who had vastly more resources? Well, you’d have two choices: give up, or get creative. A lot of Russian programmers got creative. They figured out how to do much more with a lot less because they had to.
At first, when the Soviet Union was intact, that creativity mostly just made the programmers’ bosses happy. But when the Soviet system crumbled, a lot of talented programmers left state-run outfits to found or join local software startups. Many quickly found out – sometimes to their astonishment – that they could compete with U.S. software companies by offering good software that required much less computing power.
One such success was Paragraph. Another was ABBYY FineReader, one of the first useful optical character recognition software products. It came out in 1993 and became a global leader in text recognition and document management. Another was the worldwide hit game Tetris. Yet another debuted a little later, in 1997: Kaspersky Antivirus, which for a while was one of the top global antivirus software tools.
Fast-forward to today, and DeepSeek is kind of a sequel to that movie.
We’ve tried to keep advanced computers and chips from China for a long time. In 2022, the U.S. government established export controls to keep chips like Nvidia’s H100 out of Chinese hands. The export controls have been effective. When DeepSeek started working on its AI, it had about 10,000 older Nvidia chips it had managed to pull together. By contrast, OpenAI runs on at least 10 times more processors, and those chips are the most advanced from Nvidia. (At least one source says OpenAI runs on 720,000 Nvidia chips.)
The pressure to compete when you’re a massive underdog fires up creativity. DeepSeek’s programmers were more clever than U.S. programmers because they had to be. With a wealth of computing power, U.S. coders could afford to take more wasteful routes to building their AIs. The Chinese coders didn’t have that luxury.
As for technical explanations, I’m not much help. I don’t know squat about developing an AI. But Wired has a bit about how DeepSeek did it:
DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies. “Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat.”
Is there anything to be learned from the story of Russian software in the 1990s and Chinese AI in the 2020s?
Well, for one, overconfidence is dangerous. That’s a big reason underdogs win, whether we’re talking AI or football.
I read one story about DeepSeek quoting Anthropic CEO Dario Amodei, who has been a proponent of strong export controls to keep the best chips out of China. The story, on Techcrunch, says: “If Trump strengthens export rules and prevents China from obtaining what Amodei describes as ‘millions of chips’ for AI development, the U.S. and its allies could potentially establish a ‘commanding and long-lasting lead,’ Amodei claims.”
But that kind of reminds me of IBM, circa 1980, believing personal computers could never be a threat to its hulking mainframes. Technology that is good enough, much cheaper and much more easily deployed often kicks the ass of technology that is the very best, very expensive and a pain to deploy. (Check out The Innovator’s Dilemma for more on that.)
And also, endless resources are not always a good thing. That’s another aspect of human nature. It too easily leads to waste, laziness, bureaucracy and the classic “too many cooks.”
I’m not saying that’s true at all of the U.S. AI companies, because I don’t know. And it wouldn’t matter anyway if there were no hungry and lean competitors targeting them. But OpenAI has raised around $18 billion. Anthropic, about $7 billion. U.S. AI companies are assembling historically massive computing power. The resources being thrown at AI here are almost beyond imaginable.
And it’s just possible that’s not helping.
–
I wrote quite a few stories for USA Today in the early 1990s about emerging Russian software companies, and they often included something about Russian programmers’ creative use of computing power. This is one short story that focused on that.