# SyncTide v1.0.1 — Release Notes

**Release date:** 2026-05-11
**Schema version:** 46 (migrations 045 + 046 new in this release)
**Previous internal:** v1.15.4 (versioning reset — see note below)

> **Versioning note** — the pre-launch `1.15.x` line was internal-only
> iteration. v1.0.1 is the first numbered release we ship to paying
> customers; from here on the public version line is monotonic and the
> upgrade path (delta packages, migration replay, in-place container
> swap) is fully supported.

---

## Headline changes

### 1. Comms resilience — workers self-heal from cable yanks

The OPC UA and Modbus TCP pollers used to leave a stale TCP session in
the per-device cache when the underlying poll was cancelled mid-call by
the 10-second outer timeout. Every subsequent attempt then hung on the
same dead socket until the worker process was manually restarted.

- **`BaseException` cleanup** in both drivers — any non-clean exit
  (including `asyncio.CancelledError` from `wait_for`) drops the cached
  client / subscription session so the next cycle reconnects fresh.
- **OPC UA aliveness probe** — once per poll cycle the worker reads
  `ServerStatus.State` (NodeId `i=2259`); a failure flips the device
  status to Error within seconds even when the subscription queue is
  silent (no value changes = no errors otherwise visible).
- **Exponential backoff** per device: `5s → 10s → 20s → … → 300s` cap,
  reset on first success. After 10 consecutive failures the device is
  flagged "quarantined" in the log. Earlier behaviour hammered the PLC
  at the configured poll interval indefinitely.
- **`drop_device()` hot-reload** — when a device disappears from the
  enabled list (deleted/disabled in the UI), the worker tears down its
  cached state without needing a container restart.

Cable-yank end-to-end verified: 14-second recovery on OPC UA, ~40 s on
Modbus (longer because pymodbus's connect timeout is 3 s and the
exponential backoff applies).

### 2. Telegram inbound — close the ACK loop

The platform's Telegram channel was send-only. Operators replying with
`ACK-…` over Telegram had their messages discarded — only SMS replies
forwarded by the RUT241 webhook actually closed alarms.

- **New `telegram-inbound` worker** — long-polls `getUpdates` for every
  active Telegram gateway. One HTTP round-trip per ~25 s when idle,
  near-zero quota cost.
- **Flexible matching cascade:**
  1. *Telegram reply gesture* (swipe-to-reply) — most precise.
     Matches by `provider_message_id`.
  2. *`ACK-{token}`* — original token format still works.
  3. *Simple keyword* — `ack`, `ok`, `okay`, `confirmo`, `confirmar`,
     `sim`, `yes`, `✅`, `👍`, `✓`. Matches the most recent unacked
     Telegram message for the sender's `chat_id` within 24 h.
- **Outgoing instruction simplified** — Telegram messages now read
  *"💬 Reply with \"ack\", \"ok\" or 👍 to acknowledge"* instead of the
  unfriendly `ACK-pW1Wp_da-jU` token. SMS / WhatsApp / email keep the
  precise token because their webhook parsing relies on it.
- **Confirmation reply** — bot replies *"✅ Acknowledged. Escalation
  cancelled."* the moment a match lands so the operator knows the loop
  closed.

### 3. Migration 046 — acknowledged_by_label

`alarm_events.acknowledged_by` is a bigint FK to `users(id)` and only
ever populated by in-platform UI acks. Channel-side acks (Telegram,
SMS, email-token) had no way to attribute the human/contact who closed
the alert in the alarm history view.

- **New column** `alarm_events.acknowledged_by_label TEXT` populated
  with a readable label by every ack path:
  - `"Platform: Administrator"` (in-platform button)
  - `"Telegram: Tiago Guimarães LA"` (Telegram reply)
  - `"SMS: +351912345678"` (RUT241 webhook reply)
  - `"Token: <contact_name>"` (email magic-link ack)
- **Backfill** — past in-platform acks get
  `"Platform: <user.full_name>"` from the historical `users` join.
- **API change** — `acknowledged_by_name` in the alarm-history payloads
  is now `COALESCE(acknowledged_by_label, users.full_name)` so the UI
  shows the channel-side label without any frontend changes.

### 4. Migration 045 — seconds-resolution comm timeout

The "Update data timeout" was stored as integer minutes
(`max_minutes_without_data`), too coarse for protocols with sub-second
update cycles (OPC UA subscriptions, IEC 104).

- **New column** `devices.comm_timeout_seconds INTEGER` with a check
  constraint of `>= 5 s`.
- Backfill: `comm_timeout_seconds = max_minutes_without_data * 60`,
  defaulting NULL rows to 60 s.
- UI shows four boxes — Days / Hours / Minutes / **Seconds** — and the
  Save handler writes the new column. The legacy minutes column stays
  in sync (rounded up) for any external integrations.
- **Timer is based on the most recent measurement timestamp** the
  device reported — not on polling attempts. A subscription cycle that
  returns no new data does NOT reset the clock.

### 5. UX polish

- **Streak detection** — device status flips to ❌ Error after 3
  consecutive failed polls regardless of the rolling 20-poll success
  rate. Earlier "Intermittent" with 75 % rate was misleading after a
  recent cable yank.
- **Protocol-aware Intermittent hint** — the popover help text now
  matches the protocol (Modbus / OPC UA / IEC 104 / generic) instead of
  always advising "increase MB_SERVER `DISCONNECT`".
- **Tag Address row delete** keys widgets by a stable per-row UUID so
  deleting a row in the middle of the list actually removes the
  clicked row, not the visually-last one.
- **Compact virtual-tag expression editor** — Tag / Operator / Function
  pickers are dropdowns (scales to hundreds of tags) instead of grids.
  Site-scope shows a side-by-side Device + Tag pair plus the Insert
  button below.
- **Virtual-tag validator error humanizer** — *"Missing operator
  between ${Ambient Temp} and ${ALC105:Flow}"* instead of the previous
  `Syntax error: t76 d20t1 invalid syntax`.
- **OPC UA browse** lets you pick ns=0 and ns=1 (previously filtered
  out — useful for diagnostics on non-standard servers).
- **Reports cross-FS fix** — `Path.replace` → `shutil.move`; PDF
  generation works under Docker where `/tmp` and
  `/app/reports_output` sit on different filesystems (EXDEV bug).
- **Cooldown = 0** is now valid for messaging rules. Default for new
  rules stays at 30 minutes. Zero = every alarm trigger sends a fresh
  message with no rate limit.
- **Threshold operator display** — uses Unicode glyphs (`≥`, `≤`, `›`,
  `‹`, `≠`) instead of ASCII so Streamlit's Markdown parser doesn't
  swallow the leading `>` as a blockquote.
- **Cascade delete on tag-address Save** — removing a row from the Tag
  Addresses tab also drops the corresponding `device_tag_mappings` and
  `tag_metadata` rows so the Real Time list and Variable Mapping
  picker stop showing the ghost name.
- **OPC UA browse and import auto-refreshes** Tag Addresses + Real
  Time tabs without navigating away from the page.

---

## Operator action required when upgrading from any 1.15.x

None on Postgres data — migrations 045 and 046 are additive and
idempotent. Re-running them on a customer instance is a no-op.

The container image tag changes from `synctide:1.15.0-test1` to
`synctide:1.0.1`. `docker compose up -d` after pulling reads the new
default from `docker-compose.yml`.
