# SyncTide v1.0.2 — Robustness release

**Release date:** 2026-05-11
**Schema version:** 47 (migration 047 new in this release)
**Previous:** v1.0.1 (same day — robustness work landed in a single sweep ahead of the first customer deploy)

---

## Headline

This release closes the operational gaps left over from the v1.0.1 ship
audit. Each item below was either an actual or potential customer-visible
problem; none break compatibility.

### 1. Health monitor — the watchdog of watchdogs

The platform now ships with a self-monitor service that watches the rest
of the stack. Every 60 s by default it:

- Pings the backend's `/version` endpoint.
- Runs `docker compose ps` and confirms every expected service is `Up`.
- After 3 consecutive failures, alerts the operator via Telegram (using
  the existing messaging gateway — no extra bot needed).
- On recovery, sends a follow-up "back online" message.
- Persists state to a new `health_monitor_config` table (migration 047)
  so the cooldown survives restarts.

Configurable from **Configurations → Health Monitor**: enable/disable,
Telegram gateway + chat id, check interval, failures-before-alert,
cooldown, optional expected-services override.

Disabled by default — turn it on once a Telegram gateway is wired in
Messaging Center.

### 2. Automated daily backups

A new `backup` Docker service runs `backup_cloud.sh` daily at 03:00 UTC
(`BACKUP_HOUR_UTC` env var). Captures `pg_dump` of the SyncTide DB
(custom format, TimescaleDB-aware) + tarred named volumes
(raw_data / reports / templates / mosquitto / caddy certs) + a
sanitised copy of `.env`. Rolling retention 30 days.

Set `BACKUP_RUN_AT_START=1` in `.env` to take a backup at first boot —
useful for verifying the pipeline on a fresh deploy.

The script's Windows-path bug (named volumes silently skipped under
git-bash) is fixed: `MSYS_NO_PATHCONV=1` + `cygpath -m` keeps the
docker `-v` host paths intact.

**Restore procedure verified end-to-end** against a fresh ephemeral
TimescaleDB container: every reference table matches the source row
counts (only `measurements` differs by the rows written between
backup and verification — expected, ingest never stops). Full runbook
in `docs/RECOVERY.md`.

### 3. Security hardening

Five fixes from the v1.0.1 audit:

- **`/license/upload` now requires admin** (was unauthenticated). The
  Ed25519 signature check inside `save_license_file` prevents
  forgery, but the old code let any LAN caller DoS the platform or
  downgrade-replace with a different valid `.lic`. Closed.
- **`must_change_password` enforced server-side.** Earlier only the
  Streamlit UI redirected; direct API callers with a stolen bearer
  token bypassed the gate. `get_current_user` now refuses every path
  except `/auth/me`, `/auth/logout`, `/users/me/password` when the
  flag is set.
- **`init_admin.py` seeds new admins with `must_change_password=TRUE`.**
  A customer who runs `docker compose up` with the unedited
  `.env.example.docker` (`SYNCTIDE_ADMIN_PASSWORD=change-me-on-first-login`)
  is force-flipped into the reset flow on first login.
- **`_DEFAULT_PASSWORDS` expanded** to include
  `change-me-on-first-login` (the docker template default) so the
  startup-time scan in `main.py` catches the case where the seed
  password was never changed.
- **Bulk alarm endpoints bumped from `viewer` to `operator` role.**
  `/map/alarm-acknowledge-device`, `/map/alarm-clear-device`,
  `/map/alarm-acknowledge-all`, `/map/alarm-clear-all`. Industrial
  SOP: viewers see, operators acknowledge. Single-alarm acknowledge
  stays at viewer level since that's a routine control-room action.

### 4. License-manager hardening (audit cleanup)

Zero new code — the audit found the manager already fail-softs on
every malformed input we tested (missing, empty, garbage, truncated,
bad signature, future-dated, expired, wrong instance id). Backend
stays up, all workers idle harmlessly, UI shows a clean banner.
Three P3 message-text polish items (clearer "file unreadable" vs
"file missing", less alarmist signature-failure copy) deferred —
not customer-facing in normal operation.

### 5. Operational changes worth knowing

- Docker image tag bumped to `1.0.2` (compose default updated).
- Schema version 47 — migration 047 is additive (new
  `health_monitor_config` table seeded with a disabled row). Drop-in
  upgrade from v1.0.1.
- New documentation: `docs/RECOVERY.md` — tested restore procedure.

---

## Upgrade

```bash
# Pull the new code
git pull
# Rebuild image
docker compose build backend
# Roll the stack
docker compose --profile on-prem up -d
```

Migration 047 runs automatically on backend startup. The new `backup`
and `health-monitor` services are created and started in the same
`up -d`.
