What This Solution Does

Deploy GPT OSS 20B to an NVIDIA Jetson device with one click. The container starts llama-server and exposes an OpenAI-compatible HTTP API on port 8080.

Core Value

Benefit	Details
Local inference	Run a 20B LLM entirely on edge hardware, no cloud dependency
OpenAI-compatible API	Use existing SDKs and tools without modification
One-click deploy	SSH-based remote deployment, no manual Docker commands

Use Cases

Scenario	How to Use
Chat bot backend	Connect as the AI engine for local chat applications
Voice assistant	Pair with a speech recognition frontend for offline voice AI
Multi-platform gateway	Use with OpenClaw to serve WeChat, Telegram, and other platforms

Usage Notes

Hardware Requirements:

Jetson Orin NX 16GB or higher (20B model requires ~12-15GB VRAM)
reComputer J4012 is verified; other Jetson Orin models should confirm sufficient VRAM

API Endpoint:

URL: http://<jetson-ip>:8080/v1/chat/completions
OpenAI-compatible format — works with existing SDKs
Python example: import openai; openai.api_base = "http://<jetson-ip>:8080/v1"

First Request Latency:

Initial request may take 2-5 minutes (model warm-up)
Check readiness at http://<jetson-ip>:8080/v1/models
After warm-up, subsequent requests typically respond in 1-3 seconds

Token & Context:

Default context window ~2048 tokens; adjustable during deployment
Larger context (Llama Context parameter) uses more VRAM
Keep single requests under 1000 tokens to avoid VRAM overflow

Preset: Jetson GPT OSS 20B Service {#jetson_got_oss}

Deploy GPT OSS 20B to your Jetson device with one click from this platform.

Device	Purpose
NVIDIA Jetson (reComputer)	Runs GPT OSS 20B in Docker

Step 1: Deploy GPT OSS 20B Service {#deploy_got_oss type=docker_deploy required=true config=devices/jetson.yaml}

Deploy the containerized GPT OSS 20B runtime to your Jetson over SSH.

Target: Remote Deployment (Jetson) {#jetson_remote type=remote config=devices/jetson.yaml default=true}

Deploy to your Jetson over SSH with one click.

Wiring

Connect Jetson and your computer to the same network.
Fill in Jetson IP, SSH username, and password.
Click Deploy.

Deployment Complete

The GPT OSS 20B container is running on your Jetson.
llama-server is started inside the container.
The service endpoint is available at http://<jetson-ip>:8080.
Readiness endpoint is available at http://<jetson-ip>:8080/v1/models.

Troubleshooting

Issue	Solution
SSH connection failed	Verify Jetson IP, username, password, and SSH service status
Docker runtime check failed	Ensure Docker is installed and NVIDIA runtime is available
Docker Compose unavailable	Ensure `docker compose` or `docker-compose` is installed
Service start failed	Inspect logs on Jetson: `docker compose logs --tail=200`
`503 {"message":"Loading model"}` on `/v1/models`	Model is still warming up; first run can take several minutes
Out-of-memory at startup	Reduce settings, for example set `Llama NGL=16` and `Llama Context=512`

Target: Local Deployment {#jetson_local type=local config=devices/jetson_local.yaml}

Deploy directly on the current machine (requires NVIDIA GPU with sufficient VRAM).

Wiring

Ensure Docker and NVIDIA Container Toolkit are installed
Click Deploy to start installation

Note: First startup may take 15-30 minutes for Docker image download and model loading. Requires at least 20GB free disk space.

Deployment Complete

Open http://localhost:8080 in your browser
You'll see the GPT OSS chat interface ready for interaction

Troubleshooting

Issue	Solution
NVIDIA runtime not found	Install NVIDIA Container Toolkit: `sudo apt install nvidia-container-toolkit && sudo systemctl restart docker`
Port 8080 already in use	Stop existing services on that port
Container keeps restarting	Check logs: `docker compose logs --tail=200`
GPU out of memory	The 20B model requires significant GPU memory. Try a smaller model variant

Step 2: Open Service Link {#preview_service type=preview required=false config=devices/preview.yaml}

Use this step to open the Jetson service URL directly in a new browser tab.

Wiring

Enter Jetson IP in this step.
Click Connect.
The platform opens http://<jetson-ip>:8080 in a new tab.

Deployment Complete

The service page opens in your browser.
You can return here and click Connect again to reopen it.

Troubleshooting

Issue	Solution
Invalid host input	Enter a valid IP or hostname, for example `192.168.1.100`
New tab not opened	Allow pop-ups for this site and retry
Service page not reachable	Confirm Jetson service is listening on `8080` and network is reachable

Deployment Complete

GPT OSS 20B runtime has been deployed successfully on your Jetson.

Validation Checklist

Step 1 deployment status shows success.
The GPT OSS 20B container stays in running state.
Clicking Connect in Step 2 opens http://<jetson-ip>:8080.

GPT OSS 20B on Jetson

What This Solution Does

Core Value

Use Cases

Usage Notes

連携インターフェース

デプロイ構成

Jetson GPT OSS 20B Service

ダウンロードとインストール

Preset: Jetson GPT OSS 20B Service {#jetson_got_oss}

Step 1: Deploy GPT OSS 20B Service {#deploy_got_oss type=docker_deploy required=true config=devices/jetson.yaml}

Target: Remote Deployment (Jetson) {#jetson_remote type=remote config=devices/jetson.yaml default=true}

Wiring

Deployment Complete

Troubleshooting

Target: Local Deployment {#jetson_local type=local config=devices/jetson_local.yaml}

Wiring

Deployment Complete

Troubleshooting

Step 2: Open Service Link {#preview_service type=preview required=false config=devices/preview.yaml}

Wiring

Deployment Complete

Troubleshooting

Deployment Complete

Validation Checklist