Edge LLM: DeepSeek-R1 on RK3576 - 1

Edge LLM: DeepSeek-R1 on RK3576

Run DeepSeek-R1 large language model locally on reComputer RK3576 with NPU acceleration. Choose from 5 model variants.

初級15minAI
rk3576LLMdeepseekedge-ainpu

What It Does

Turn your reComputer RK3576 into a local AI chatbot. DeepSeek-R1 runs entirely on your device — no cloud, no API fees, no data leaving your network.

Core Value

  • Private by design — all conversations stay on your device, your data never leaves your network
  • Multiple model sizes — choose from 1.5B (fast, lightweight) or 7B (stronger reasoning) with different compression options
  • Standard API — OpenAI-compatible interface works with existing tools and libraries
  • NPU accelerated — Rockchip NPU handles inference efficiently on low-power hardware

Use Cases

ScenarioDescription
Edge chatbotBuild customer-facing chat without cloud dependency
Local code assistantGet coding help on air-gapped networks
Document Q&AProcess sensitive documents without uploading to cloud
IoT command parsingParse natural language commands for device control

Good to Know

  • 1.5B models run comfortably on 4GB+ devices; 7B models need 8GB+
  • First startup takes 30-60 seconds for model loading
  • Inference speed depends on model size and quantization level
  • W4A16 quantization offers the best balance of speed and quality

連携インターフェース

http

OpenAI-compatible chat completion API (supports streaming)

/v1/chat/completions · Port: 8001 · Method: POST
{"model":"rkllm-model","messages":[{"role":"user","content":"Hello"}],"max_tokens":512,"stream":false}
http

List available models

/v1/models · Port: 8001 · Method: GET

ご利用要件

network

Network connection for Docker image pull

デプロイ構成

ダウンロードとインストール

Preset: RK3576 LLM Inference {#rk3576_llm}

Deploy DeepSeek-R1 large language model to your reComputer RK3576 with one click.

DevicePurpose
reComputer RK3576Runs DeepSeek-R1 LLM with NPU acceleration

What you'll get:

  • OpenAI-compatible chat API running locally on your device
  • Choose from 5 model variants (1.5B/7B, different quantizations)
  • No cloud dependency — all inference runs on-device

Requirements: RK3576 device with SSH access + Docker installed

Step 1: Deploy DeepSeek-R1 {#deploy_llm type=docker_deploy required=true config=devices/rk3576.yaml}

Deploy the LLM container to your RK3576 device.

Target: Remote Deployment {#rk3576_remote type=remote config=devices/rk3576.yaml default=true}

Deploy to your RK3576 over SSH with one click.

Wiring

  1. Connect RK3576 to the same network as your computer
  2. Select the model variant you want to run
  3. Fill in device IP, SSH username, and password
  4. Click Deploy

Deployment Complete

  1. The LLM container is running on your RK3576
  2. Chat API is available at http://<device-ip>:8001/v1/chat/completions
  3. Use any OpenAI-compatible client to connect

Troubleshooting

IssueSolution
SSH connection failedVerify IP address, username, password
NPU not detectedEnsure device is RK3576 with RKNPU kernel module loaded
Out of memory (7B model)7B variants require 8GB+ RAM. Try a 1.5B variant instead
Image pull slowCheck network connection. Image size is 1-4GB depending on variant

Step 2: Try Chat {#verify_llm type=text_chat required=false config=devices/llm_chat.yaml}

Test the LLM by sending a message.

Troubleshooting

IssueSolution
Connection refusedWait 30-60 seconds for model to load
Timeout7B models take longer. Wait up to 2 minutes
Empty responseCheck container logs: docker logs ai_lab_llm

Deployment Complete

DeepSeek-R1 is running on your RK3576 device.

Quick Start

curl http://<device-ip>:8001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "rkllm-model", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 256}'

Python Example

import openai
client = openai.OpenAI(base_url="http://<device-ip>:8001/v1", api_key="dummy")
response = client.chat.completions.create(
    model="rkllm-model",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=256
)
print(response.choices[0].message.content)
お問い合わせ
ハードウェアパートナーとしてうれしいです!
これまで当社製品を使用したことがありますか?