E2E tests pass locally but fail in the Forgejo Actions container environment. Disabled until the root cause is resolved.
4.6 KiB
4.6 KiB
WireGUI — TODO
WireGuard Metrics Collector
Overview
Separate Python process dedicated to high-frequency WireGuard stats collection, with optional VictoriaMetrics time-series storage. Replaces the current 60s in-process polling with a 5s external collector.
Current state
tasks/stats.py: pollswg show dumpevery 60s inside the web process asyncio loop- UI timers: 30s refresh on device pages
- Worst-case latency: ~90s before a stat change is visible
Target state
- Collector process: polls every 5s, writes to DB + VictoriaMetrics
- UI timers: 10s refresh
- Worst-case latency: ~15s
Phase 1: Configuration ✅
- Add settings to
config.py:WG_METRICS_ENABLED: bool = FalseWG_METRICS_POLL_INTERVAL: int = 5(seconds)WG_VICTORIAMETRICS_URL: str | None = None(e.g.http://localhost:8428)
- When
WG_METRICS_ENABLED=false, keep existingstats_loopas fallback - When
WG_METRICS_ENABLED=true, skip registeringstats_loopinmain.py
Phase 2: Collector process ✅
- Create
wiregui/collector.py— standalone entry point (python -m wiregui.collector) - No NiceGUI dependency — only asyncio + asyncpg + httpx
- Poll
wg show <iface> dumpeveryWG_METRICS_POLL_INTERVALseconds - Update Device rows in PostgreSQL (same fields as current
stats_loop) - Push metrics to VictoriaMetrics via
/api/v1/import/prometheus(if URL configured) - Graceful shutdown on SIGTERM/SIGINT
- Web app spawns collector as subprocess when
WG_METRICS_ENABLED=true - Web app terminates collector on shutdown
Phase 3: VictoriaMetrics metrics ✅
All metrics implemented in collector.py and verified by integration tests:
wiregui_peer_rx_bytes{public_key, user_email, device_name}— counterwiregui_peer_tx_bytes{public_key, user_email, device_name}— counterwiregui_peer_latest_handshake_seconds{public_key, user_email, device_name}— gaugewiregui_peer_connected{public_key, user_email, device_name}— 1 if handshake < 180s, else 0wiregui_peers_total— gauge, count of active peers
Phase 4: UI improvements
- Reduce UI timer from 30s to 5s on all device pages (devices.py, admin/devices.py, detail page)
- Add connection status indicator (green/yellow/red dot) based on handshake age
- Green: handshake < 2 min
- Yellow: handshake < 5 min
- Red: no recent handshake or never connected
- Status column in both user and admin device tables
- Status badge on device detail page (live-updating)
- Add traffic rate display (RX/s, TX/s computed from delta between 5s polls)
- Device detail page: live ECharts traffic rate chart (RX/s + TX/s area lines, 60-point rolling window, auto-scaled axis with human-readable byte formatting)
Phase 5: Infrastructure ✅
- Create
compose.test.yml— full integration stack with real WG - Add VictoriaMetrics (single-node, port 8428, 7d retention)
- Add 3 mock WG client containers (alpine + wireguard-tools)
- Clients generate traffic by pinging each other through the tunnel every 3s
- Setup script (
docker/mock-clients/setup.py) generates keypairs and configs - Collector runs as subprocess inside the WireGUI container (shares network namespace)
- Add VictoriaMetrics to dev
compose.yml(optional, for local testing)
Design notes
- Why a separate process? The
wg showsubprocess call and DB writes at 5s intervals shouldn't share the asyncio loop with the web app. A separate process ensures UI responsiveness isn't affected by stats collection. - Why not
run.cpu_bound? That usesProcessPoolExecutorfor one-shot CPU tasks inside request handling — not suitable for a long-running daemon. A separate entry point is cleaner. - VictoriaMetrics push model: Use the Prometheus remote write API. No scrape config needed — the collector pushes directly. VictoriaMetrics is optional; the collector works fine with just PostgreSQL.
- Backward compatible: When
WG_METRICS_ENABLED=false(default), everything works exactly as it does today.
CI/Testing
- Fix E2E tests in CI — tests pass locally but fail in the Forgejo Actions container environment (stale DB reads between app subprocess and test process, Playwright can't resolve Docker service hostnames for SAML redirect). Currently disabled in
.forgejo/workflows/dev.yml.
UI
- SAML provider management in Authentication tab (admin settings)
- SSO Providers on account page: add Status column, "Disconnect" action
- Admin pages (users, devices, rules): apply same card-based styling as account/settings/diagnostics
Features
- First-run CLI setup command