The Chinese AI company DeepSeek, which was only founded in 2023, has caused a stir in the technology industry with its cost-effective and efficient AI technology.
According to the company, its open source models achieve performance comparable to leading systems such as ChatGPT – and with significantly fewer resources. Renowned venture capitalist Marc Andreessen described DeepSeek as “one of the most amazing and impressive breakthroughs I’ve ever seen” and called it a “Sputnik moment in AI”. This assessment seems to be reflected in the user numbers: The company’s mobile app shot straight to the top of the iPhone download charts in several countries, including the US, UK and China, following its release in early January. By January 25, the application had already recorded 1.6 million downloads. It works just like ChatGPT and other alternatives. In terms of design, it is also clearly reminiscent of the AI pioneer. The chat history is located on the left-hand side and the input field in the middle of the screen.
Efficiency instead of gigantism
DeepSeek’s approach could challenge the prevailing view that progress in AI development must inevitably be accompanied by ever-increasing energy and resource consumption. The company’s R1 model performs similarly well or even better than competing products in key benchmarks such as AIME 2024 for mathematical tasks and MMLU for general knowledge – and apparently with significantly less training effort.
The technical specifications of the model are particularly noteworthy: with a total of 671 billion parameters, it is one of the largest language models in the world. The special feature here is the Mixture-of-Experts (MoE) architecture used, which only activates around 37 billion parameters per token. This selective activation enables a significantly more efficient use of resources without having to accept any loss of accuracy or reasoning ability. The model was trained with 14.8 trillion tokens and required 2.664 million GPU hours on H800 graphics processors. With a context length of up to 128,000 tokens, it also outperforms many established models.
Challenge for US export controls
DeepSeek’s success also raises questions about the effectiveness of US export restrictions on high-end semiconductors to China. The company seems to have found a way to develop efficient AI models despite limited resources. Company founder Liang Wenfeng emphasizes that it is not the amount of investment but innovative approaches that are decisive.
However, the system also has its limits: Like all Chinese AI models, DeepSeek is also subject to self-censorship on politically sensitive topics. In addition, the company’s cloud infrastructure still has to prove itself with the rapidly growing number of users.