โ† Dashboard ยท Docs ยทAdmin Guide

Grove Admin Guide

> Build e6c937427256 ยท Updated 2026-06-07

Fleet Overview

Cell Host Role Notes
mac localhost Dev / soak M4 16 GB; default deploy target
familynook (FN) 100.126.143.83 Prod, WAN gateway Xeon; grove.nook.website; nginx TLS
nookman 100.96.243.69 Prod Pi 8 GB; relay jump host for cell1/cell2
cell1 100.104.249.123 Dev/soak Pi 8 GB; uses ~/.grove/.venv/bin/python
cell2 100.121.6.45 Dev/soak Pi 4 GB; Tailscale SSH
funbook 100.81.15.45 Dev/soak Linux laptop; relay-connected

Port 5678 everywhere. Port 5679 = watchdog panic page. Port 5680 = relay WebSocket.


Deployment

Quick Start (new cell via invite)


curl -s https://grove.nook.website/invite/<token> | bash

Manual Install


mkdir -p ~/.grove
pip3 install cryptography flask flask-socketio requests pynacl websocket-client websockets zeroconf
# Copy grove.py, web.py, acme.py, acme_jws.py, watchdog.py, relay.py + assets/ to ~/.grove/
cd ~/.grove && nohup python3 web.py </dev/null >/tmp/grove-web.log 2>&1 &
# Dashboard at http://localhost:5678

Use on nohup invocations (especially over SSH) to prevent SIGHUP.

deploy.sh

The canonical fleet deploy script. Ships 5 Python files + relay.py + assets/ dir.


# Deploy to mac (dev, default)
bash deploy.sh mac

# Deploy to all nodes
bash deploy.sh

# Skip test suite
bash deploy.sh --skip-tests

# Roll back the previous build on a node
bash deploy.sh --rollback mac

# Promote the soaked build to production (FN โ†’ nookman, canary order)
bash deploy.sh --promote

--promote requires a clean working tree, a full gauntlet pass on mac, and refuses --skip-tests. It deploys familynook first (x86 canary), waits (default 30 s, override with CANARY_SOAK_SEC=N), then nookman.

Build hash = SHA-256 of grove.py + web.py + acme.py + acme_jws.py + watchdog.py + 9 frontend assets (assets/), first 12 hex chars. Both deploy.sh and web.py compute the same hash; mismatches surface in /api/version.

familynook special case

FN runs under a dedicated grove system account. deploy.sh scp's files to ~/ (nook's home), then calls bash deploy-grove.sh which uses sudo to copy them into /home/grove/.grove/. This requires a sudoers line on FN:


nook ALL=(grove) NOPASSWD: /bin/cp, /bin/bash

(See fn-grant-nopasswd.sh in the repo root for the exact grant.)


systemd Services

These unit files live in systemd/ and are installed via sudo bash systemd/install.sh.


sudo bash systemd/install.sh            # grove + watchdog
sudo bash systemd/install.sh --with-relay  # also installs grove-relay

grove.service โ€” main daemon


[Unit]
Description=Grove Distributed Storage
After=network-online.target tailscaled.service

[Service]
Type=simple
User=grove
WorkingDirectory=/home/grove/.grove
ExecStart=/usr/bin/python3 /home/grove/.grove/web.py
Restart=always
RestartSec=5
MemoryMax=512M
ReadWritePaths=/home/grove/.grove /home/grove/GroveHome /tmp

grove-watchdog.service โ€” crash recovery + panic page


[Service]
Type=simple
ExecStart=/usr/bin/python3 /home/grove/.grove/watchdog.py --no-redirect
Restart=always
RestartSec=3

Watchdog panic page at :5679. When Grove (5678) is down this page lets you restart it from a browser.

grove-ai.service โ€” llama-server (AI hosts only)

Managed via the web UI or:


python3 ~/.grove/web.py ai-service install
python3 ~/.grove/web.py ai-service status
python3 ~/.grove/web.py ai-service uninstall

Type=forking; spawns the llama-server and exits. The watchdog defers to this unit when it is active. Do NOT enable on a Metal Mac โ€” a full-offload server wires unevictable memory.

grove-relay.service โ€” WebSocket relay (optional)

Only needed on a dedicated relay host. Most cells auto-connect to the fleet relay; this unit is not required for normal operation.

Pi keepalive cron (no grove.service)

Pis run grove-watchdog.service instead of grove.service. A */5 cron provides the reboot bootstrap and proactive memory management:


# crontab -e (as the grove user)
*/5 * * * * ~/.grove/grove-health-local.sh >> ~/.grove/health.log 2>&1

deploy.sh ships grove-health-local.sh to each Pi automatically.


# Install the watchdog service on a Pi
python3 ~/.grove/web.py watchdog-service install
sudo systemctl enable --now grove-watchdog

Reverse Proxy / Gateway / TLS

Any cell can act as a WAN gateway. Two options:

Option A โ€” Grove's built-in ACME (Let's Encrypt HTTP-01)

Set public_url in ~/.grove/config.json and ensure port 80 is reachable. acme.py handles cert issuance and renewal automatically.

Option B โ€” External nginx (familynook's setup)

nginx-grove.conf in the repo root is the reference config for grove.nook.website.

SECURITY-CRITICAL: on a public gateway, nginx must inject X-Grove-Public-Edge: 1 on every request proxied to :5678, overwriting any client-supplied value (since :5678 isn't directly reachable from the internet, this header is a trustworthy public-vs-tailnet signal). Grove's check_dashboard_auth uses this header to block the owner dashboard session and admin /api/* routes over the public edge โ€” they remain tailnet-only. Without this tag, a valid owner session cookie obtained over Tailscale could be replayed over the public internet to unlock the full control panel.


# Required addition to every proxy_pass block that reaches :5678 from the public edge:
proxy_set_header X-Grove-Public-Edge "1";

The public edge serves: /portal, /site/, /invite/, /relay, /.well-known/, /login, /api/version, /api/health, and peer-sync APIs. The owner dashboard (/dashboard, /files, admin /api/*) is tailnet-only.

Relay WebSocket passes through to :5680 (relay.py) โ€” no X-Grove-Public-Edge needed there.


# Relay โ€” no grove auth, separate port
location /relay {
    proxy_pass http://127.0.0.1:5680;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_read_timeout 86400;
}

SSL with Certbot (nginx path)


sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d grove.nook.website
# Renewal is automatic via certbot.timer

Configuration

~/.grove/config.json key fields:


{
    "self_name": "my-cell",
    "web_port": 5678,
    "web_host": "0.0.0.0",
    "public_url": "https://grove.nook.website",
    "desired_factor": 2,
    "relay_url": "ws://100.96.243.69:5680",
    "relay_host": true,
    "storage_cap_gb": 50,
    "peers": [],
    "ai_enabled": true,
    "peer_model_policy": "friends",
    "disable_auto_update": false
}

peer_model_policy controls which peers may use this cell's local AI: "friends" (default), "all", "public", "active-only", or "none".


Dev / Prod Tracks + Rollback

Production = familynook + nookman. Everything else is dev/soak.

Normal workflow:

1. Deploy to mac: bash deploy.sh mac

2. Let it soak; run gauntlet: python3 gauntlet.py

3. Promote to prod: bash deploy.sh --promote

If a build passes health checks but misbehaves:


bash deploy.sh --rollback familynook nookman

This restores the .bak snapshot that deploy.sh captured before the deploy.


AI Hosting

Grove uses llama.cpp (not ollama). Models live in ~/.grove/.

  • Set up via the AI tab in the dashboard (UI-driven download, build, enable)
  • grove-ai.service supervises the llama-server on Linux/systemd hosts
  • The */5 keepalive cron also starts the AI server if it's configured but not running (ensure_ai in grove-health-local.sh)
  • Do NOT auto-start the llama-server at boot on a Mac with Metal โ€” use the AI tab to start on demand

Manage via CLI:


python3 ~/.grove/web.py ai-service install    # install grove-ai.service
python3 ~/.grove/web.py ai-service status

Monitoring

  • GET /api/version โ€” build hash, version, uptime
  • GET /api/health โ€” disk, chunk count, peer count
  • GET /api/replication-status โ€” per-peer replication health
  • GET /api/route-speeds โ€” route performance data
  • Watchdog panic page at :5679 โ€” visible when Grove is down

Logs:

  • /tmp/grove-web.log โ€” main daemon stdout/stderr
  • /tmp/grove-watchdog.log โ€” watchdog log
  • journalctl -u grove -f โ€” on systemd boxes
  • ~/.grove/health.log โ€” keepalive cron log (Pis)

Backup & Recovery

Back up identity keys (critical)


# Via dashboard: Settings โ†’ Backup Keys (downloads ZIP)
# Via API:
curl -sf http://localhost:5678/api/backup-keys > grove-keys.zip

If you lose ~/.grove/.key, your encrypted files are unrecoverable.

Restore keys


# Via dashboard: Settings โ†’ Restore Keys (upload ZIP)
# Via API:
curl -sf -X POST http://localhost:5678/api/restore-keys -F "file=@grove-keys.zip"

Forgot dashboard password

Three paths, in order of friction:

1. Logged in elsewhere? Settings โ†’ Recovery โ†’ Generate recovery link. Single-use, valid 24 h, resets only this cell.

2. Locked out everywhere? From a terminal on the device:


   grove auth set-password

3. No grove CLI? Visit /setup from localhost to set a new password.

Leaving Grove

  • This device only (data stays on the network): Settings โ†’ Leave Grove โ†’ "Leave from this device."
  • Permanent network departure (peers delete your chunks):
  • 
      grove apoptosis
    

Type APOPTOSIS to confirm. Signed tombstone propagates to peers. Irreversible.


Operational Safety

Never run network-severing commands over SSH on a remote node. If you sever your own connection, the Pi nodes may be unreachable for days.

Dangerous over SSH (requires a recovery timer or on-box scheduled job):

  • nmcli con down/modify, systemctl restart NetworkManager
  • iptables -F or any firewall flush
  • tailscale down, tailscale logout
  • ip link set โ€ฆ down, reboots without a recovery at job

pkill web.py over SSH kills your own shell (the SSH session process matches the grep). Restart by PID instead:


# Find the PID
cat ~/.grove/grove-web.pid
# Kill just that PID
kill <pid>
# Restart
cd ~/.grove && nohup python3 web.py </dev/null >/tmp/grove-web.log 2>&1 &

Before restarting Grove, check for queued image-gen jobs:


python3 -c "import json; jobs=json.load(open('$HOME/.grove/ai_image_jobs.json')); print([j for j in jobs if j.get('status') in ('queued','running')])"

A blind restart kills the in-process worker; queued/running jobs become errors and must be re-submitted.


Security Hardening Checklist

  • [ ] Private key files are chmod 600
  • [ ] ~/.grove directory is chmod 700
  • [ ] Dashboard password is set and strong (grove auth set-password)
  • [ ] Public-facing cells behind nginx with HTTPS
  • [ ] nginx injects X-Grove-Public-Edge: 1 on every request proxied to :5678
  • [ ] Session cookies have HttpOnly + SameSite flags (automatic)
  • [ ] Peer secret comparison is timing-safe (automatic)
  • [ ] Owner dashboard is NOT reachable from the public internet (verify: curl -I https://grove.nook.website/dashboard should redirect to login, not render the dashboard)