Running Deepseek LLM on Raspberry Pi 4

1 minute read

Published: February 04, 2025

Introduction

This post is about running Deepseek Inference on Raspberry Pi 4.

Set up llama.cpp

llama.cpp is a lightweight and fast inference engine for running LLMs on CPUs. build provides instructions on how to build the project in various configurations such as adding CUDA support, building on Android, etc. Here I will build the project for Raspberry Pi 4 in a very basic manner.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
#Build llama.cpp using CMake:
cmake -B build
cmake --build build --config Release -j 8 # Build with 8 threads

Download Deepseek distill model

We will use the distill model that has 8B parameters. Click Files and versions on the left top side of the page to download various models that have different level of quantization Huggingface. I tested F16 and Q2_K models. F16 model is the full precision model and Q2_K model is the quantized model with 2-bit. It takes 2-3 minutes to load the F16 model and the actual token inference never happen before the llama process getting stuck, while the Q2_K model can be loaded way faster. But the token inference speed is still around 1 token/second.

Run the model

In the top directory of the llama.cpp repository, run the following command to start the model:

./build/bin/llama-cli -m models/DeepSeek-R1-Distill-Llama-8B-Q2_K.gguf

performance This page provides some tips on setting up multiple threads and GPUs to improve the performance. More running option flags can be found by use –help flag for llama-cli.

Share on

Twitter Facebook LinkedIn

Zeren Li

Running Deepseek LLM on Raspberry Pi 4

Introduction

Set up llama.cpp

Download Deepseek distill model

Run the model

Share on

You May Also Enjoy

Proof by contradiction

Modular division proof

Bitcoin double spending

Bitcoin double spending

VLSIDesignChain: 2 - Static Timing Analysis by OpenTimer

Static Timing Analysis by OpenTimer