DeepSeek-V3 is a state-of-the-art open-source language model developed by DeepSeek, a Chinese AI firm based in Hangzhou. Released in December 2024, this model employs a Mixture-of-Experts (MoE) architecture, featuring 671 billion total parameters with 37 billion activated per token.
It has been trained on 14.8 trillion tokens, achieving performance comparable to leading closed-source models like GPT-4o and Claude 3.5 Sonnet.
Scorecard
⛔️ Availability | No, Try and Claude 3.5 Sonnet here instead |
🐙 Model Type | Large Language Model (LLM) |
🗓️ Release Date | December 2024 |
📅 Training Data Cut-off Date | N/A |
📏 Parameters (Size) | 671B total, 37B activated per token |
🔢 Context Window | 128k tokens |
🌎 Supported Languages | English, Chinese |
📈 MMLU Score | 88.5% |
🗝️ API Availability | Yes |
💰 Pricing (per 1M Token) | Input: $0.27, Output: $1.10 per 1M tokens |
Architecture 🏗️
DeepSeek-V3 utilizes a Mixture-of-Experts (MoE) architecture, comprising 671 billion total parameters, with 37 billion activated per token. This design enables efficient inference and cost-effective training. The model incorporates Multi-head Latent Attention (MLA) and an auxiliary-loss-free strategy for load balancing, enhancing its performance across various tasks.
Performance 🏎️
In benchmark evaluations, DeepSeek-V3 demonstrates impressive results:
- MMLU (Massive Multitask Language Understanding): 88.5%
- MMLU-Redux: 89.1%
- MMLU-Pro: 75.9%
- HumanEval-Mul (Pass@1): 82.6%
- MATH-500 (Exact Match): 90.2%
These scores indicate that DeepSeek-V3 outperforms other open-source models and rivals leading closed-source models in various language understanding and reasoning tasks.
Pricing 💵
DeepSeek offers competitive pricing for API access to DeepSeek-V3:
Token Pricing
- Input Tokens (Cache Miss): $0.14 per 1 million tokens
- Input Tokens (Cache Hit): $0.014 per 1 million tokens
- Output Tokens: $0.28 per 1 million tokens
These rates are effective from February 8, 2025. Until then, promotional rates are available.
Example Cost Calculation
For a session processing 500,000 input tokens (cache miss) and generating 200,000 output tokens:
- Input Cost: 500,000 tokens × ($0.14 / 1,000,000) = $0.07
- Output Cost: 200,000 tokens × ($0.28 / 1,000,000) = $0.056
- Total Cost: $0.07 + $0.056 = $0.126
Use Cases 🗂️
DeepSeek-V3 is versatile and suitable for various applications, including:
- Natural Language Understanding: Text summarization, sentiment analysis, and question answering.
- Code Generation: Assisting in software development by generating code snippets.
- Mathematical Problem Solving: Tackling complex mathematical equations and problems.
- Extended Context Processing: Handling large documents or conversations with its 64k token context window.
Customization
As an open-source model, DeepSeek-V3 allows for fine-tuning to cater to specific use cases, providing flexibility for developers and researchers to adapt the model to their unique requirements.