中国人工智能初创公司深度求索（DeepSeek）更新了最新的V4版本AI模型，将模型推理速度最高提升了85% （档案照片）

DeepSeek Releases V4 AI Model with Inference Speed Boost up to 85%

Published at Jun 28, 2026 10:03 am

Chinese artificial intelligence (AI) startup DeepSeek has updated its latest V4 version AI model, increasing model inference speed by up to 85% while significantly reducing deployment costs.

According to comprehensive reports from Wallstreetcn and IT Home, DeepSeek updated to the V4 version on Saturday (June 27), launching the speculative decoding framework DSpark and open-sourcing the full stack tool DeepSpec.

DeepSeek-V4-Pro-DSpark is not an entirely new model architecture, but instead introduces a speculative decoding module based on DeepSeek-V4-Pro. Therefore, the focus of this update is on engineering implementation, rather than iterative improvements to the model's capabilities themselves.

Speculative decoding is a technology that effectively improves inference efficiency without affecting the output results of the model. The technology has a lightweight model generate candidate content in advance, which is then verified by the main model, thereby accelerating the inference speed of large language models (LLMs).

DeepSpec is a comprehensive open-source tool for training and evaluating speculative decoding draft models. It includes functions such as data preparation, model training, draft model implementation, and performance evaluation. This helps researchers directly train speculative decoding models, greatly lowering the deployment threshold.

According to a paper jointly published by DeepSeek founder Liang Wenfeng and Peking University, deploying DSpark in DeepSeek-V4’s online service system and operating it in real user traffic environments can effectively reduce computing waste caused by invalid verifications.

Compared to existing production baseline solutions, DSpark can increase generation speeds for individual users by 60% to 85% at the same throughput.

This is the achievement DeepSeek has taken the lead to introduce in the field of AI inference efficiency optimization after completing a 50 billion RMB (about 9.53 billion SGD) financing round. It shows that this startup, in addition to focusing on improving model capabilities, is also striving to gain an edge in the race for computing efficiency.

Currently, AI models developed by Chinese companies are moving towards high performance and low cost lightweight directions—a trend that is challenging the longstanding dominance of American companies.

According to Bloomberg, data from OpenRouter shows that as of June this year, the share of token requests sent to Google, OpenAI, and Anthropic models has dropped sharply from 72% a year ago to 33%; whereas as of March, the share of Chinese AI models has risen to over 60%.

One of the main drivers behind the surge in usage of Chinese AI models is precisely their significant cost-performance advantage over American models.

Author

联合日报新闻室

First 'Tiger' Falls at Central Social Work Department Since Its Establishment Over Three Years Ago – Vice Minister He Zhiliang Investigated While in Office

DeepSeek Releases V4 AI Model with Inference Speed Boost up to 85%

相关报道

First 'Tiger' Falls at Central Social Work Department Since Its Establishment Over Three Years Ago – Vice Minister He Zhiliang Investigated While in Office

Fuzhou continues to set new records!