Dataset collection
Record an SO-101 teleop dataset with both arms + cameras and push it to the Hugging Face Hub.
This guide walks through recording a LeRobot dataset on Haller's SO-101 arms and pushing it to the Hugging Face Hub. It's the prerequisite for training or fine-tuning a policy on your own task.
It assumes the LeRobot environment and the SO-101 arm bring-up are already done, and that you can teleoperate via the HMI.
The collection path documented here is Phase 1: recording happens out-of-process via scripts/record_dataset.sh, which wraps lerobot-record. The HMI exposes a Recording panel that builds the command for you, but launching recording from inside the HMI process is separate work (Phase 2) — see the roadmap at the end.
Why a custom dataset
You can run policy inference against any SO-101 dataset on the Hub (e.g. lerobot/svla_so101_pickplace). That's a useful smoke test of the inference stack.
You cannot, however, expect a published policy to succeed at your task on your hardware without finetuning. Different cameras, different lighting, different calibration midpoints, different objects — generalist VLAs like π0.5 and GR00T N1.7 narrow the gap considerably but don't close it. So: record demonstrations on your arm, of your task, then either train ACT on ~50 episodes or LoRA-finetune a generalist VLA on ~20–50 episodes.
The dataset format is LeRobotDataset (parquet shards + per-camera MP4s), the same shape that every policy in lerobot-train consumes.
Prerequisites
Hardware
- Follower arm at
/dev/haller_arm_follower, leader arm at/dev/haller_arm_leader(udev symlinks fromscripts/99-haller-devices.rules). - Both arms calibrated (see SO-101 arms). The HMI's calibration bootstrap auto-copies the leader's teleoperator calibration into the follower directory so both can be driven as follower-style robots.
- At least one camera plugged in. A 2-camera setup (wrist + base) is the norm in published SO-101 datasets and the recommended target — most generalist VLAs are trained with multiple camera streams. A single base camera is fine for ACT smoke testing but suboptimal for VLA finetuning.
Software
- HMI venv installed (
~/venvs/haller-hmi/) withlerobot[feetech]>=0.5,<0.6andopencv-python>=4.x— both pulled in byhmi/backend/pyproject.toml. - Hugging Face CLI authenticated:
hf auth login --token "$HUGGINGFACE_TOKEN" - (Optional, recommended) write-access HF repo where the dataset will land. The script defaults to
<your-hf-user>/so101_<slug>and creates the repo on first push.
1. Wire your cameras
Cameras are declared in hmi/backend/config.yaml. The same (index_or_path, width, height, fps) tuple feeds both the HMI's live view and lerobot-record, so configuring it once is enough.
Find your camera devices:
# Each USB webcam typically registers 2 nodes (capture + metadata). The
# capture node is the smaller-numbered one in each pair.
for d in /dev/video*; do
echo "=== $d ==="
udevadm info --query=property --name="$d" | grep -E "ID_(MODEL|VENDOR|V4L_PRODUCT)="
done
# Or use the v4l2-utils package if installed:
# v4l2-ctl --list-devicesTest a candidate device with ffplay:
ffplay -f v4l2 -framerate 30 -video_size 640x480 -i /dev/video2
# Ctrl-C to close.Then edit hmi/backend/config.yaml:
cameras:
- id: wrist_right
role: wrist
arm_id: right # binds this camera to the "right" arm card in the HMI
source: opencv
index_or_path: /dev/video2
width: 640
height: 480
fps: 30
- id: base_front
role: base
source: opencv
index_or_path: /dev/video0
width: 640
height: 480
fps: 30Restart the HMI to pick up the new config:
# Dev laptop: stop scripts/run_hmi.sh and restart it
# Jetson: sudo systemctl restart haller-hmi.serviceOpen the dashboard. You should see live thumbnails in the Cameras strip and a real wrist-camera feed inside each arm card.
A camera marked source: placeholder still appears in the HMI but renders the dashed-border "no feed" placeholder. That's the right state for a slot reserved for hardware that hasn't arrived yet.
2. Smoke-test the camera pipeline
From the HMI dashboard:
- The Cameras strip should show
N/N livefor every configuredopencvcamera. - Each tile should show a real image with the configured resolution + FPS printed in the lower-right corner.
From the command line:
# Snapshot a single JPEG from each camera
curl -o /tmp/base.jpg http://localhost:8000/cameras/base_front/snapshot
# Watch the live stream in a browser
xdg-open http://localhost:8000/cameras/base_front/streamIf a camera 503s the snapshot endpoint, check the HMI logs — the most common cause is "device busy" because another process (cheese, a previous scripts/run_hmi.sh that didn't shut down cleanly, etc.) is holding it open.
3. Stop the HMI before recording
lerobot-record needs exclusive control of:
- the leader serial port (
/dev/haller_arm_leader) — to read the operator-driven joint positions, - the follower serial port (
/dev/haller_arm_follower) — to send actions, - every configured camera device — to capture frames in lockstep with the control loop.
The HMI holds all of these while it's running, so you must stop it first.
# Dev laptop: Ctrl-C the scripts/run_hmi.sh process
# Jetson: sudo systemctl stop haller-hmi.servicescripts/record_dataset.sh will refuse to start if anything else has those device nodes open, and tells you which PID is holding them.
4. Record
The script lives at scripts/record_dataset.sh and wraps lerobot-record with sensible defaults.
# Required arg: the task description (1 sentence — used as the language
# instruction and slugified for the dataset name).
# Optional arg: number of episodes (default 50).
scripts/record_dataset.sh "Grab the red cube and place it in the box" 20What happens:
- Activates the HMI venv (
~/venvs/haller-hmi/). - Resolves your HF username from
hf auth whoami(orHF_USER=...env override). - Refuses to start if any required device is in use; prints a confirmation banner with the resolved task, dataset name, cameras, and ports.
- Runs
lerobot-recordwith--display_data=truesorerunopens a live viewer of joint positions + camera feeds. - For each episode: leader → follower teleop for
EPISODE_TIME_SEC(default 30 s), then aRESET_TIME_SEC(default 5 s) pause to reset the scene. - On the final episode, encodes videos, writes parquet shards, and pushes the dataset to
<your-hf-user>/so101_<slug>.
Tunables (env vars)
| Variable | Default | Notes |
|---|---|---|
HF_USER | from hf auth whoami | Override if your HF org differs from your username. |
DATASET_REPO | ${HF_USER}/so101_<slug> | Set explicitly to pick a custom repo. |
FPS | 30 | Capture + control rate. Matches most public SO-101 datasets. |
EPISODE_TIME_SEC | 30 | Max time per episode. |
RESET_TIME_SEC | 5 | Pause between episodes for scene reset. |
CAMERAS_JSON | base camera on /dev/video0 | Full lerobot --robot.cameras dict. Override when you have multiple cameras. |
Example: two cameras, 60 episodes of a fruit-sorting task into a custom repo:
HF_USER=myteam \
DATASET_REPO=myteam/so101_fruit_sort_v1 \
CAMERAS_JSON='{ wrist: {type: opencv, index_or_path: /dev/video2, width: 640, height: 480, fps: 30}, base: {type: opencv, index_or_path: /dev/video0, width: 640, height: 480, fps: 30}}' \
scripts/record_dataset.sh "Sort the fruit by color into the matching bowl" 60Keyboard controls during recording
lerobot-record watches the keyboard:
| Key | Effect |
|---|---|
→ (right) | end current episode early, save, move on |
← (left) | end current episode early, discard, re-record |
ESC | stop the run; videos written so far are encoded and pushed |
5. Sanity-check the dataset
Visualize the dataset locally before training:
hf download <your-hf-user>/so101_<slug> --repo-type=dataset --local-dir ~/lerobot_data/so101_<slug>
lerobot-visualize-dataset \
--repo-id="<your-hf-user>/so101_<slug>" \
--episode-index=0Or open the auto-generated dataset card on the Hub — for v2.1+ datasets HuggingFace renders an interactive episode viewer right on the page.
Things to check:
- Episode count matches what you intended.
- Camera videos play without dropped frames.
- Joint state and action arrays are non-zero and varied — flat traces usually mean the leader wasn't being driven, or the follower was stuck in STOP mode.
- The language instruction is what you intended (parquet column
task).
6. What to do with the dataset
| Goal | Path |
|---|---|
| Train ACT from scratch (single task, ≥50 episodes) | lerobot-train --policy.type=act --dataset.repo_id=... — fits on a laptop GPU. |
| LoRA-finetune SmolVLA-base | --policy.path=lerobot/smolvla_base --policy.peft_config.use_peft=true on a 16 GB+ cloud GPU. |
| LoRA-finetune π0.5 (recommended VLA path) | See RunPod inference — scripts/runpod/finetune_pi05_lora.sh <your-dataset> is the one-liner. |
| Replay-eval an existing policy on your data | See RunPod inference — scripts/runpod/replay_eval.py runs π0.5 / pi0 against your dataset and dumps per-joint error + plots. |
| Finetune NVIDIA GR00T N1.7 | Follow Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm — official guide, ports cleanly to N1.7. |
Live closed-loop evaluation on the real arm is out of scope of this guide; see the LeRobot imitation-learning tutorial once you have a trained policy.
Roadmap
Phase 2 — HMI-integrated recorder. Recording from inside the HMI without having to stop it. Requires the HMI to multiplex serial-port access (currently each SO101Follower owns its port exclusively) and to own the camera capture loop (which it already does). Tracked as future work; nothing in config.yaml needs to change when it lands.
Phase 3 — Closed-loop policy evaluation in the HMI. A policy path field, a "deploy" button, the existing E-STOP wired into the policy loop, and a small reward-labeling UI for SARM-style stage-aware reward modeling. See the LeRobot v0.5.0 release notes for what's available upstream.
SO-101 MuJoCo simulation
Three HMI-driven MuJoCo presets — solo follower, bimanual, leader+follower — that reuse the real HMI surface for dev without hardware, dataset generation, and closed-loop eval.
RunPod inference and finetuning
Rent a cloud GPU, smoke-test π0.5 / pi0 / GR00T on an SO-101 dataset, replay-eval against your recordings, and LoRA-finetune to your task.