Running DeepSeek R1 Locally: What the Official Repo Supports and What It Does Not

Many articles about “running DeepSeek R1 locally” blur together three very different things:

the full DeepSeek-R1 model
the officially released distilled models
community wrappers such as desktop apps and local model launchers

If you want a clean mental model, start with the official DeepSeek R1 repository.

The Most Important Distinction

According to the official repo:

DeepSeek-R1 and DeepSeek-R1-Zero are the flagship reasoning models
DeepSeek also released several distill models based on Qwen2.5 and Llama families

The repo lists these distilled variants:

DeepSeek-R1-Distill-Qwen-1.5B
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Llama-8B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B

That distinction matters because “local setup” usually means one of the distill models, not the full 671B flagship.

The Model Family in One Table

| Layer | What it means in practice | |---|---| | DeepSeek-R1-Zero | RL-first reasoning model path | | DeepSeek-R1 | Refined reasoning model with better readability/alignment | | Distill models | Smaller practical variants based on Qwen or Llama backbones | | Full flagship scale | Research/deployment target, not casual desktop default |

Source:

DeepSeek R1 repository: https://github.com/deepseek-ai/DeepSeek-R1

What the Official Repo Says About the Model Family

The official README describes:

DeepSeek-R1-Zero as a reasoning model trained through large-scale RL without SFT as a first step
DeepSeek-R1 as the follow-up pipeline that adds cold-start data before RL to improve readability and alignment
the flagship R1 models as 671B total / 37B activated / 128K context

It also says the distilled smaller models were fine-tuned from open-source Qwen and Llama bases using samples generated by DeepSeek-R1.

For practical local usage, this leads to a simple conclusion:

If you want something realistic to run on common hardware, you should usually evaluate a distill model, not assume the flagship model is the right target.

What the Official Repo Recommends for Local Inference

The official README does not primarily frame local setup around desktop GUIs. Instead, it points users to framework-based inference for the distilled models, especially:

vLLM
SGLang

The repo gives concrete examples such as:

serving DeepSeek-R1-Distill-Qwen-32B with vLLM
launching the same class of model with SGLang

This is the important operational insight:

DeepSeek's own docs are optimized for serious inference frameworks, not for “download a chat app and hope for the best.”

A Better Hardware Reality Check

When people say they are “running R1 locally,” they often mean one of these:

a 1.5B
7B
8B
14B
32B

distill model

The official repo itself gives you the family breakdown, and that is the most reliable starting point for choosing a target.

In practical terms:

1.5B to 8B is the realistic entry point for modest local hardware
14B to 32B is where you need to think much more carefully about memory, throughput, and serving setup
the 70B class is already closer to “serious workstation or server planning” than casual desktop experimentation

| Model range | Best use | |---|---| | 1.5B to 8B | First local experiments and lightweight testing | | 14B to 32B | More serious local inference with stronger hardware planning | | 70B | Workstation/server-grade experiments | | Full R1 flagship | Infrastructure project, not convenience setup |

Official Usage Recommendations That Matter

The DeepSeek R1 README includes several usage recommendations that many secondary guides skip. The most important ones are:

use a temperature in the 0.5 to 0.7 range, with 0.6 recommended
avoid system prompts
for math tasks, explicitly ask the model to reason step by step

The repo also notes that the R1 series may sometimes bypass a full thinking pattern, and suggests forcing the model to begin with <think>\n when you want stronger reasoning behavior.

That is a strong reminder that local quality depends on:

prompt style
serving stack
model variant
decoding configuration

not just on whether the model weights exist on your disk

Where Community Tools Fit In

Tools like Ollama, LM Studio, Open WebUI, and desktop chat wrappers can still be useful. But if you care about accuracy, reproducibility, and alignment with what the model authors documented, the right order is:

understand the official distill model lineup
understand the official usage recommendations
choose your community wrapper only after that

That avoids a common mistake:

blaming the model for behavior that actually comes from a wrapper, quantized conversion, or unsupported serving path

A Practical Decision Framework

Use this before you start:

Choose a model target

want the easiest local experiment: start with a small distill model
want stronger reasoning quality and can afford heavier infra: move up the distill stack
want to study the flagship R1 family itself: treat it as an infrastructure project, not a casual desktop install

Choose a serving path

want the closest path to official guidance: start with vLLM or SGLang
want convenience over purity: a desktop/local wrapper may still be fine, but accept that it is not the official path

Tune expectations

“runs locally” does not mean “runs well”
model size and prompting strategy matter as much as the launch command
if reasoning quality is your main goal, follow the repo's own decoding guidance instead of using default settings blindly

Quick Decision Matrix

| Goal | More realistic choice | |---|---| | Fast local trial | Small distill model | | Stronger reasoning test | Mid or large distill model | | Official framework-aligned serving | vLLM or SGLang | | Casual GUI-first usage | Community wrapper, with lower expectations |

Bottom Line

The highest-quality way to think about local DeepSeek R1 usage is:

the flagship R1 model family is a research and deployment reference point
the distill models are the realistic local entry point for most users
the official repo expects a framework-based serving workflow, especially through vLLM and SGLang

If you start from those three facts, you will make better choices about hardware, tooling, and prompt setup than you would by following generic “run it locally in five minutes” blog posts.

Sources

DeepSeek R1 official repository: https://github.com/deepseek-ai/DeepSeek-R1
DeepSeek V3 official repository: https://github.com/deepseek-ai/DeepSeek-V3