wiregui/TODO.md

93 lines
4.6 KiB
Markdown
Raw Permalink Normal View History

# WireGUI — TODO
feat: initial WireGUI implementation — full VPN management platform Complete Python/NiceGUI rewrite of the Wirezone (Elixir/Phoenix) VPN management platform. All 10 implementation phases delivered. Core stack: - NiceGUI reactive UI with SQLModel ORM on PostgreSQL (asyncpg) - Alembic migrations, Valkey/Redis cache, pydantic-settings config - WireGuard management via subprocess (wg/ip/nft CLIs) - 164 tests passing, 35% code coverage Features: - User/device/rule CRUD with admin and unprivileged roles - Full device config form with per-device WG overrides - WireGuard client config generation with QR codes - REST API (v0) with Bearer token auth for all resources - TOTP MFA with QR registration and challenge flow - OIDC SSO with authlib (provider registry, auto-create users) - Magic link passwordless sign-in via email - SAML SP-initiated SSO with IdP metadata parsing - WebAuthn/FIDO2 security key registration - nftables firewall with per-user chains and masquerade - Background tasks: WG stats polling, VPN session expiry, OIDC token refresh, WAN connectivity checks - Startup reconciliation (DB ↔ WireGuard state sync) - In-memory notification system with header badge - Admin UI: users, devices, rules, settings (3 tabs), diagnostics - Loguru logging with optional timestamped file output Deployment: - Multi-stage Dockerfile (python:3.13-slim) - Docker Compose prod stack (bridge networking, NET_ADMIN, nftables) - Forgejo CI: tests → semantic versioning → Docker registry push - Health endpoint at /api/health
2026-03-30 16:53:46 -05:00
---
## WireGuard Metrics Collector
### Overview
feat: initial WireGUI implementation — full VPN management platform Complete Python/NiceGUI rewrite of the Wirezone (Elixir/Phoenix) VPN management platform. All 10 implementation phases delivered. Core stack: - NiceGUI reactive UI with SQLModel ORM on PostgreSQL (asyncpg) - Alembic migrations, Valkey/Redis cache, pydantic-settings config - WireGuard management via subprocess (wg/ip/nft CLIs) - 164 tests passing, 35% code coverage Features: - User/device/rule CRUD with admin and unprivileged roles - Full device config form with per-device WG overrides - WireGuard client config generation with QR codes - REST API (v0) with Bearer token auth for all resources - TOTP MFA with QR registration and challenge flow - OIDC SSO with authlib (provider registry, auto-create users) - Magic link passwordless sign-in via email - SAML SP-initiated SSO with IdP metadata parsing - WebAuthn/FIDO2 security key registration - nftables firewall with per-user chains and masquerade - Background tasks: WG stats polling, VPN session expiry, OIDC token refresh, WAN connectivity checks - Startup reconciliation (DB ↔ WireGuard state sync) - In-memory notification system with header badge - Admin UI: users, devices, rules, settings (3 tabs), diagnostics - Loguru logging with optional timestamped file output Deployment: - Multi-stage Dockerfile (python:3.13-slim) - Docker Compose prod stack (bridge networking, NET_ADMIN, nftables) - Forgejo CI: tests → semantic versioning → Docker registry push - Health endpoint at /api/health
2026-03-30 16:53:46 -05:00
Separate Python process dedicated to high-frequency WireGuard stats collection, with optional VictoriaMetrics time-series storage. Replaces the current 60s in-process polling with a 5s external collector.
feat: initial WireGUI implementation — full VPN management platform Complete Python/NiceGUI rewrite of the Wirezone (Elixir/Phoenix) VPN management platform. All 10 implementation phases delivered. Core stack: - NiceGUI reactive UI with SQLModel ORM on PostgreSQL (asyncpg) - Alembic migrations, Valkey/Redis cache, pydantic-settings config - WireGuard management via subprocess (wg/ip/nft CLIs) - 164 tests passing, 35% code coverage Features: - User/device/rule CRUD with admin and unprivileged roles - Full device config form with per-device WG overrides - WireGuard client config generation with QR codes - REST API (v0) with Bearer token auth for all resources - TOTP MFA with QR registration and challenge flow - OIDC SSO with authlib (provider registry, auto-create users) - Magic link passwordless sign-in via email - SAML SP-initiated SSO with IdP metadata parsing - WebAuthn/FIDO2 security key registration - nftables firewall with per-user chains and masquerade - Background tasks: WG stats polling, VPN session expiry, OIDC token refresh, WAN connectivity checks - Startup reconciliation (DB ↔ WireGuard state sync) - In-memory notification system with header badge - Admin UI: users, devices, rules, settings (3 tabs), diagnostics - Loguru logging with optional timestamped file output Deployment: - Multi-stage Dockerfile (python:3.13-slim) - Docker Compose prod stack (bridge networking, NET_ADMIN, nftables) - Forgejo CI: tests → semantic versioning → Docker registry push - Health endpoint at /api/health
2026-03-30 16:53:46 -05:00
### Current state
- `tasks/stats.py`: polls `wg show dump` every 60s inside the web process asyncio loop
- UI timers: 30s refresh on device pages
- Worst-case latency: ~90s before a stat change is visible
feat: initial WireGUI implementation — full VPN management platform Complete Python/NiceGUI rewrite of the Wirezone (Elixir/Phoenix) VPN management platform. All 10 implementation phases delivered. Core stack: - NiceGUI reactive UI with SQLModel ORM on PostgreSQL (asyncpg) - Alembic migrations, Valkey/Redis cache, pydantic-settings config - WireGuard management via subprocess (wg/ip/nft CLIs) - 164 tests passing, 35% code coverage Features: - User/device/rule CRUD with admin and unprivileged roles - Full device config form with per-device WG overrides - WireGuard client config generation with QR codes - REST API (v0) with Bearer token auth for all resources - TOTP MFA with QR registration and challenge flow - OIDC SSO with authlib (provider registry, auto-create users) - Magic link passwordless sign-in via email - SAML SP-initiated SSO with IdP metadata parsing - WebAuthn/FIDO2 security key registration - nftables firewall with per-user chains and masquerade - Background tasks: WG stats polling, VPN session expiry, OIDC token refresh, WAN connectivity checks - Startup reconciliation (DB ↔ WireGuard state sync) - In-memory notification system with header badge - Admin UI: users, devices, rules, settings (3 tabs), diagnostics - Loguru logging with optional timestamped file output Deployment: - Multi-stage Dockerfile (python:3.13-slim) - Docker Compose prod stack (bridge networking, NET_ADMIN, nftables) - Forgejo CI: tests → semantic versioning → Docker registry push - Health endpoint at /api/health
2026-03-30 16:53:46 -05:00
### Target state
- Collector process: polls every 5s, writes to DB + VictoriaMetrics
- UI timers: 10s refresh
- Worst-case latency: ~15s
### Phase 1: Configuration ✅
- [x] Add settings to `config.py`:
- `WG_METRICS_ENABLED: bool = False`
- `WG_METRICS_POLL_INTERVAL: int = 5` (seconds)
- `WG_VICTORIAMETRICS_URL: str | None = None` (e.g. `http://localhost:8428`)
- [x] When `WG_METRICS_ENABLED=false`, keep existing `stats_loop` as fallback
- [x] When `WG_METRICS_ENABLED=true`, skip registering `stats_loop` in `main.py`
### Phase 2: Collector process ✅
- [x] Create `wiregui/collector.py` — standalone entry point (`python -m wiregui.collector`)
- [x] No NiceGUI dependency — only asyncio + asyncpg + httpx
- [x] Poll `wg show <iface> dump` every `WG_METRICS_POLL_INTERVAL` seconds
- [x] Update Device rows in PostgreSQL (same fields as current `stats_loop`)
- [x] Push metrics to VictoriaMetrics via `/api/v1/import/prometheus` (if URL configured)
- [x] Graceful shutdown on SIGTERM/SIGINT
- [x] Web app spawns collector as subprocess when `WG_METRICS_ENABLED=true`
- [x] Web app terminates collector on shutdown
### Phase 3: VictoriaMetrics metrics ✅
All metrics implemented in `collector.py` and verified by integration tests:
- [x] `wiregui_peer_rx_bytes{public_key, user_email, device_name}` — counter
- [x] `wiregui_peer_tx_bytes{public_key, user_email, device_name}` — counter
- [x] `wiregui_peer_latest_handshake_seconds{public_key, user_email, device_name}` — gauge
- [x] `wiregui_peer_connected{public_key, user_email, device_name}` — 1 if handshake < 180s, else 0
- [x] `wiregui_peers_total` — gauge, count of active peers
### Phase 4: UI improvements
- [x] Reduce UI timer from 30s to 5s on all device pages (devices.py, admin/devices.py, detail page)
- [x] Add connection status indicator (green/yellow/red dot) based on handshake age
- Green: handshake < 2 min
- Yellow: handshake < 5 min
- Red: no recent handshake or never connected
- [x] Status column in both user and admin device tables
- [x] Status badge on device detail page (live-updating)
- [x] Add traffic rate display (RX/s, TX/s computed from delta between 5s polls)
- [x] Device detail page: live ECharts traffic rate chart (RX/s + TX/s area lines, 60-point rolling window, auto-scaled axis with human-readable byte formatting)
### Phase 5: Infrastructure ✅
- [x] Create `compose.test.yml` — full integration stack with real WG
- [x] Add VictoriaMetrics (single-node, port 8428, 7d retention)
- [x] Add 3 mock WG client containers (alpine + wireguard-tools)
- [x] Clients generate traffic by pinging each other through the tunnel every 3s
- [x] Setup script (`docker/mock-clients/setup.py`) generates keypairs and configs
- [x] Collector runs as subprocess inside the WireGUI container (shares network namespace)
- [ ] Add VictoriaMetrics to dev `compose.yml` (optional, for local testing)
### Design notes
- **Why a separate process?** The `wg show` subprocess call and DB writes at 5s intervals shouldn't share the asyncio loop with the web app. A separate process ensures UI responsiveness isn't affected by stats collection.
- **Why not `run.cpu_bound`?** That uses `ProcessPoolExecutor` for one-shot CPU tasks inside request handling — not suitable for a long-running daemon. A separate entry point is cleaner.
- **VictoriaMetrics push model:** Use the Prometheus remote write API. No scrape config needed — the collector pushes directly. VictoriaMetrics is optional; the collector works fine with just PostgreSQL.
- **Backward compatible:** When `WG_METRICS_ENABLED=false` (default), everything works exactly as it does today.
---
## CI/Testing
- [ ] Fix E2E tests in CI — tests pass locally but fail in the Forgejo Actions container environment (stale DB reads between app subprocess and test process, Playwright can't resolve Docker service hostnames for SAML redirect). Currently disabled in `.forgejo/workflows/dev.yml`.
## UI
- [ ] SAML provider management in Authentication tab (admin settings)
- [ ] SSO Providers on account page: add Status column, "Disconnect" action
- [ ] Admin pages (users, devices, rules): apply same card-based styling as account/settings/diagnostics
## Features
- [ ] First-run CLI setup command