A Chinese research lab has unveiled what is shaping up to be one of the most advanced “open” AI models to date. DeepSeek V3, developed by the AI firm DeepSeek, was officially launched on Wednesday under a permissive license that enables developers to freely download, modify, and use it for a wide range of applications, including commercial purposes.
A Versatile and Powerful Tool
DeepSeek V3 is designed to handle a variety of text-based tasks, including coding, translating, and generating essays or emails from prompts. What sets it apart, however, is its benchmark performance. According to DeepSeek’s internal testing, the model outperforms both publicly available open-source AI models and even some proprietary “closed” AI systems that are accessible only through APIs.
In coding competitions hosted on Codeforces, a platform known for programming contests, DeepSeek V3 has proven superior to notable rivals such as Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B. The model also excels on the Aider Polyglot benchmark, which evaluates a model’s ability to generate new code that integrates seamlessly with existing codebases.
An Engineering Feat
DeepSeek V3 was trained on a massive dataset containing 14.8 trillion tokens—equivalent to roughly 11 trillion words. This immense training set is matched by the model’s size: it boasts 671 billion parameters, or 685 billion when hosted on AI development platform Hugging Face. To put this in perspective, it’s about 1.6 times the size of Meta’s Llama 3.1 405B.
While parameter count isn’t the sole determinant of performance, larger models tend to exhibit higher accuracy and greater versatility. However, running a model of this scale requires powerful hardware. An unoptimized version of DeepSeek V3 would demand a bank of high-performance GPUs to process queries efficiently.
Despite these challenges, the development of DeepSeek V3 was remarkably efficient. The model was trained in just two months using a data center equipped with Nvidia H800 GPUs. This is particularly notable given that U.S. export restrictions have limited Chinese companies’ access to advanced GPUs. Additionally, DeepSeek claims to have spent only $5.5 million on training the model—a fraction of what it costs to develop systems like OpenAI’s GPT-4.
Limitations and Challenges
While DeepSeek V3 is a technical triumph, it has notable drawbacks. Its responses to politically sensitive topics are filtered to align with the Chinese government’s regulations. For example, queries about the Tiananmen Square incident are met with silence. This is not unusual for Chinese AI models, which must comply with government mandates to “embody core socialist values” and avoid contentious subjects.
DeepSeek’s Ambitions
DeepSeek is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that integrates AI into its trading strategies. The company has invested heavily in infrastructure, including server clusters with up to 10,000 Nvidia A100 GPUs. High-Flyer’s ultimate goal, as articulated by its founder Liang Wenfeng, is to achieve “superintelligent” AI through DeepSeek’s efforts.
Wenfeng views proprietary AI systems, like those developed by OpenAI, as a temporary advantage that won’t deter others from catching up. “Closed-source AI is only a momentary moat,” he remarked in a recent interview.
The Future of Open AI
With DeepSeek V3, the open-source AI community gains a formidable new tool, demonstrating that innovation is possible even within challenging regulatory and technological constraints. As AI continues to evolve, models like DeepSeek V3 could play a crucial role in shaping the future of both open and commercial AI development.