โ† Dashboard ยท Docs ยทChat Auth Issue

> STATUS: RESOLVED (as of build e6c937427256, 2026-06-07). All three bugs documented below (before_request rejecting per-peer secrets, send_chat using global headers, _retry_undelivered_chat passing a string instead of a dict) were fixed in build ebfe97bfc566 on 2026-04-06 and the fixes remain intact in the current codebase: check_dashboard_auth (web.py ~line 1418) now derives the expected per-peer secret from X-Grove-Sender and accepts it; _retry_undelivered_chat passes peer_info (the dict) to peer_headers_for(); and the portal send path uses peer_headers_for(_portal_peer_info) throughout. This file is retained as a historical incident record.

Chat Auth Issue โ€” Mac โ†” Familynook

Symptom

Sending chat messages between Mac and familynook fails in both directions.

Root Cause

Mac and familynook have different global peer_secret files and have never completed a shared secret exchange. All auth paths fail:

1. Per-peer derived secret: uses global secret as HMAC key โ†’ different globals = different derivations

2. Shared secret: never exchanged because POST /api/exchange-secret itself requires auth (chicken-and-egg)

3. Global secret fallback: different values on each node

Evidence

Fixes Attempted

1. Allow exchange-secret from known peers without auth

Changed /api/exchange-secret to accept requests from any pubkey already in config, bypassing verify_peer_request(). Deployed to both nodes (build cf7cf23e72ee).

Result: Still failing. The exchange may not be triggering, or there's a timing issue โ€” the exchange happens during opp sync peer probing, which may not have run yet, or the exchange request itself may still be failing for another reason.

2. Per-peer auth headers in chat send

Changed send_message to use peer_headers_for(peer_info) instead of peer_headers() (global secret).

Result: Was crashing with 'str' object has no attribute 'get' because we passed the pubkey string instead of the peer info dict. Fixed that, but the underlying auth still fails.

3. Chat message queue + retry

Added _retry_undelivered_chat() to opp sync loop. Messages saved with delivered: false are retried when the peer becomes reachable.

Result: Queue works correctly โ€” messages save locally and show โ—‹ indicator. But retry also fails because the auth issue persists.

4. Encryption pubkey caching for offline chat

Added meta.json per chat dir, caching encryption pubkey for offline message encryption. Also cache during opp sync peer probing and in send_message on successful fetch.

Result: Works for encryption, but irrelevant to the auth problem.

Likely Causes to Investigate

1. Exchange-secret not triggering: The fix allows known pubkeys through, but the opp sync may not have run a cycle yet, or _exchange_shared_secret() may be failing before reaching the endpoint (e.g., wrong host/port).

2. Exchange-secret sends global secret in header: Line 3353 โ€” _exchange_shared_secret() still sends get_peer_secret() in the header even though we relaxed the endpoint auth. The endpoint should now accept it regardless, but verify the request is actually reaching the endpoint.

3. Different peer_secret files: The fundamental bootstrap problem. If nodes are installed independently, they generate random peer secrets. Need a mechanism for initial trust establishment that doesn't depend on a pre-shared secret. Options:

4. Config peer entry mismatch: Familynook may have Mac's pubkey stored differently or not at all, causing the "known peer" check to fail even with the relaxed auth.

5. Port/host issues: _exchange_shared_secret uses port 5678, but familynook may only accept on a different interface. The pubkey fetch works (200) but the POST may be routing differently.

Quick Debug Steps for Next Session


# Check if shared_secret got exchanged after a sync cycle
python3 -c "import json; c=json.loads(open('~/.grove/config.json').read()); print([(p['user'],bool(p.get('shared_secret'))) for p in c['peers']])"

# Check familynook's peer list for Mac's pubkey
ssh familynook "sudo -u grove python3 -c \"import json; c=json.loads(open('/home/grove/.grove/config.json').read()); print([(p.get('user','?'),p.get('pubkey','')[:20]) for p in c['peers']])\""

# Force a secret exchange manually
curl -X POST http://100.126.143.83:5678/api/exchange-secret \
  -H "Content-Type: application/json" \
  -d "{\"pubkey\":\"$(python3 -c 'from grove import get_node_pubkey_b64; print(get_node_pubkey_b64())')\",\"proposed_secret\":\"$(openssl rand -hex 32)\"}"

# Watch logs during chat attempt
tail -f /tmp/grove-web.log | grep -a "exchange\|secret\|chat\|403"

Resolution (2026-04-06, build ebfe97bfc566)

Root cause: THREE bugs working together:

Bug A: before_request middleware only accepts global secret

The @app.before_request auth gate (line ~653) checked X-Grove-Secret against get_peer_secret() (the global secret) only. Per-peer derived secrets โ€” which are the ONLY secrets that work between nodes with different global secrets โ€” were rejected before verify_peer_request() in the endpoint ever ran. This was the PRIMARY blocker.

Fix: Added per-peer derived secret check to before_request: if X-Grove-Sender header is present, derive the expected secret using _derive_peer_secret(my_pk, sender_pk) and accept if it matches.

Bug B: send_chat used global headers

The initial chat send (send_chat) used peer_headers() (global secret) instead of peer_headers_for(peer_info) (per-peer derived secret). Even if Bug A was fixed, this would still send the wrong secret.

Fix: Build a {"pubkey": peer_key} dict from the peer's signing pubkey and pass to peer_headers_for().

Bug C: _retry_undelivered_chat passed string to peer_headers_for

The retry function passed pk (a pubkey string) to peer_headers_for() which expects a dict. This crashed with AttributeError: 'str' object has no attribute 'get' on every retry attempt.

Fix: Changed to pass peer_info (the dict from the outer loop) instead of pk.

Bonus: Duplicate pubkey fetch

send_chat fetched /api/pubkey twice (once for encryption key, once for signing key). Collapsed to a single fetch.

Verified

Other Issues Found During This Work