DeepSeek v3

DeepSeek-V3 offers a 64k context length with input costs at $0.14 and output at $0.28 per 1 million tokens
DeepSeek v3 Logo Square

DeepSeek-V3 is a state-of-the-art open-source language model developed by DeepSeek, a Chinese AI firm based in Hangzhou. Released in December 2024, this model employs a Mixture-of-Experts (MoE) architecture, featuring 671 billion total parameters with 37 billion activated per token.

It has been trained on 14.8 trillion tokens, achieving performance comparable to leading closed-source models like GPT-4o and Claude 3.5 Sonnet.

Scorecard

⛔️ Availability No, Try and Claude 3.5 Sonnet here instead
🐙 Model Type Large Language Model (LLM)
🗓️ Release Date December 2024
📅 Training Data Cut-off Date N/A
📏 Parameters (Size) 671B total, 37B activated per token
🔢 Context Window 128k tokens
🌎 Supported Languages English, Chinese
📈 MMLU Score 88.5%
🗝️ API Availability Yes
💰 Pricing (per 1M Token) Input: $0.27, Output: $1.10 per 1M tokens

Architecture 🏗️

DeepSeek-V3 utilizes a Mixture-of-Experts (MoE) architecture, comprising 671 billion total parameters, with 37 billion activated per token. This design enables efficient inference and cost-effective training. The model incorporates Multi-head Latent Attention (MLA) and an auxiliary-loss-free strategy for load balancing, enhancing its performance across various tasks.

Performance 🏎️

In benchmark evaluations, DeepSeek-V3 demonstrates impressive results:

  • MMLU (Massive Multitask Language Understanding): 88.5%
  • MMLU-Redux: 89.1%
  • MMLU-Pro: 75.9%
  • HumanEval-Mul (Pass@1): 82.6%
  • MATH-500 (Exact Match): 90.2%

These scores indicate that DeepSeek-V3 outperforms other open-source models and rivals leading closed-source models in various language understanding and reasoning tasks.

DeepSeek v3

Pricing 💵

DeepSeek offers competitive pricing for API access to DeepSeek-V3:

Token Pricing

  • Input Tokens (Cache Miss): $0.14 per 1 million tokens
  • Input Tokens (Cache Hit): $0.014 per 1 million tokens
  • Output Tokens: $0.28 per 1 million tokens

These rates are effective from February 8, 2025. Until then, promotional rates are available.

Example Cost Calculation

For a session processing 500,000 input tokens (cache miss) and generating 200,000 output tokens:

  • Input Cost: 500,000 tokens × ($0.14 / 1,000,000) = $0.07
  • Output Cost: 200,000 tokens × ($0.28 / 1,000,000) = $0.056
  • Total Cost: $0.07 + $0.056 = $0.126

Use Cases 🗂️

DeepSeek-V3 is versatile and suitable for various applications, including:

  • Natural Language Understanding: Text summarization, sentiment analysis, and question answering.
  • Code Generation: Assisting in software development by generating code snippets.
  • Mathematical Problem Solving: Tackling complex mathematical equations and problems.
  • Extended Context Processing: Handling large documents or conversations with its 64k token context window.

Customization

As an open-source model, DeepSeek-V3 allows for fine-tuning to cater to specific use cases, providing flexibility for developers and researchers to adapt the model to their unique requirements.

About the author
Yucel Faruk

Yucel Faruk

Growth Hacker ✨ • I love building digital products and online tools using Tailwind and no-code tools.

16 AI Models, 🤖 Single Membership 💵

Upgrade now to try 20 powerful LLMs. Get the most comprehensive AI comparison and insights.

Compare AI Models: AI Comparision Tool & Guide

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Compare AI Models: AI Comparision Tool & Guide.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.