GPT OSS 20B on Jetson - 1

GPT OSS 20B on Jetson

One-click SSH deployment of GPT OSS 20B on NVIDIA Jetson with a prebuilt Docker image.

初級10minLLM
JetsonLLMdockeredge-ai

What This Solution Does

Deploy GPT OSS 20B to an NVIDIA Jetson device with one click. The container starts llama-server and exposes an OpenAI-compatible HTTP API on port 8080.

Core Value

BenefitDetails
Local inferenceRun a 20B LLM entirely on edge hardware, no cloud dependency
OpenAI-compatible APIUse existing SDKs and tools without modification
One-click deploySSH-based remote deployment, no manual Docker commands

Use Cases

ScenarioHow to Use
Chat bot backendConnect as the AI engine for local chat applications
Voice assistantPair with a speech recognition frontend for offline voice AI
Multi-platform gatewayUse with OpenClaw to serve WeChat, Telegram, and other platforms

Usage Notes

Hardware Requirements:

  • Jetson Orin NX 16GB or higher (20B model requires ~12-15GB VRAM)
  • reComputer J4012 is verified; other Jetson Orin models should confirm sufficient VRAM

API Endpoint:

  • URL: http://<jetson-ip>:8080/v1/chat/completions
  • OpenAI-compatible format — works with existing SDKs
  • Python example: import openai; openai.api_base = "http://<jetson-ip>:8080/v1"

First Request Latency:

  • Initial request may take 2-5 minutes (model warm-up)
  • Check readiness at http://<jetson-ip>:8080/v1/models
  • After warm-up, subsequent requests typically respond in 1-3 seconds

Token & Context:

  • Default context window ~2048 tokens; adjustable during deployment
  • Larger context (Llama Context parameter) uses more VRAM
  • Keep single requests under 1000 tokens to avoid VRAM overflow

連携インターフェース

http

OpenAI-compatible chat completion API

/v1/chat/completions · Port: 8080 · Method: POST
{"choices":[{"message":{"content":"response text"}}]}

デプロイ構成

edge_device

ダウンロードとインストール

Preset: Jetson GPT OSS 20B Service {#jetson_got_oss}

Deploy GPT OSS 20B to your Jetson device with one click from this platform.

DevicePurpose
NVIDIA Jetson (reComputer)Runs GPT OSS 20B in Docker

Step 1: Deploy GPT OSS 20B Service {#deploy_got_oss type=docker_deploy required=true config=devices/jetson.yaml}

Deploy the containerized GPT OSS 20B runtime to your Jetson over SSH.

Target: Remote Deployment (Jetson) {#jetson_remote type=remote config=devices/jetson.yaml default=true}

Deploy to your Jetson over SSH with one click.

Wiring

  1. Connect Jetson and your computer to the same network.
  2. Fill in Jetson IP, SSH username, and password.
  3. Click Deploy.

Deployment Complete

  1. The GPT OSS 20B container is running on your Jetson.
  2. llama-server is started inside the container.
  3. The service endpoint is available at http://<jetson-ip>:8080.
  4. Readiness endpoint is available at http://<jetson-ip>:8080/v1/models.

Troubleshooting

IssueSolution
SSH connection failedVerify Jetson IP, username, password, and SSH service status
Docker runtime check failedEnsure Docker is installed and NVIDIA runtime is available
Docker Compose unavailableEnsure docker compose or docker-compose is installed
Service start failedInspect logs on Jetson: docker compose logs --tail=200
503 {"message":"Loading model"} on /v1/modelsModel is still warming up; first run can take several minutes
Out-of-memory at startupReduce settings, for example set Llama NGL=16 and Llama Context=512

Target: Local Deployment {#jetson_local type=local config=devices/jetson_local.yaml}

Deploy directly on the current machine (requires NVIDIA GPU with sufficient VRAM).

Wiring

  1. Ensure Docker and NVIDIA Container Toolkit are installed
  2. Click Deploy to start installation

Note: First startup may take 15-30 minutes for Docker image download and model loading. Requires at least 20GB free disk space.

Deployment Complete

  1. Open http://localhost:8080 in your browser
  2. You'll see the GPT OSS chat interface ready for interaction

Troubleshooting

IssueSolution
NVIDIA runtime not foundInstall NVIDIA Container Toolkit: sudo apt install nvidia-container-toolkit && sudo systemctl restart docker
Port 8080 already in useStop existing services on that port
Container keeps restartingCheck logs: docker compose logs --tail=200
GPU out of memoryThe 20B model requires significant GPU memory. Try a smaller model variant

Step 2: Open Service Link {#preview_service type=preview required=false config=devices/preview.yaml}

Use this step to open the Jetson service URL directly in a new browser tab.

Wiring

  1. Enter Jetson IP in this step.
  2. Click Connect.
  3. The platform opens http://<jetson-ip>:8080 in a new tab.

Deployment Complete

  1. The service page opens in your browser.
  2. You can return here and click Connect again to reopen it.

Troubleshooting

IssueSolution
Invalid host inputEnter a valid IP or hostname, for example 192.168.1.100
New tab not openedAllow pop-ups for this site and retry
Service page not reachableConfirm Jetson service is listening on 8080 and network is reachable

Deployment Complete

GPT OSS 20B runtime has been deployed successfully on your Jetson.

Validation Checklist

  1. Step 1 deployment status shows success.
  2. The GPT OSS 20B container stays in running state.
  3. Clicking Connect in Step 2 opens http://<jetson-ip>:8080.
お問い合わせ
ハードウェアパートナーとしてうれしいです!
これまで当社製品を使用したことがありますか?