How I Achieved Subsecond Glass-to-Glass Latency

Background

I am building a real-time vision system with two live views:

  1. A stitched panorama from multiple fixed cameras.
  2. A live auto-tracking PTZ feed.

For the first MVP, I optimized for speed of implementation and used MJPEG streaming end to end. That helped me ship quickly, but during the MVP presentation users flagged two issues immediately:

  1. High bandwidth usage.
  2. More than 2 seconds glass-to-glass latency.

Latency here is measured from OSD timestamp versus wall-clock timestamp on the client display.

Hardware Topology

All components are connected over Tailscale. The capture side is behind a subnet router, and the uplink uses Starlink.

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 55, "rankSpacing": 75}}}%%
flowchart TB
  subgraph EDGE["Edge Location"]
    CAM["IP Cameras"] -->|LAN| GATE["Edge PC / Subnet Gateway (192.168.1.0/24)"]
    GATE --- SAT["Starlink (WAN uplink)"]
  end
  subgraph HQ["Server Location"]
    SRV["Application Server (on-prem)"]
    CLI["Client Device (Browser)"]
  end
  GATE -->|Tailscale ~84 ms| SRV["Application Server (on-prem)"]
  SRV -->|Tailscale ~84 ms| GATE
  SRV -->|Tailscale ~1 ms| CLI["Client Device (Browser)"]
  CLI -->|Tailscale ~1 ms| SRV

Camera Configuration

For latency-sensitive operation, the camera stream was configured with the following encoder profile:

Legacy Architecture (MVP)

My original pipeline was:

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 50, "rankSpacing": 70}}}%%
flowchart TB
  A["RTSP source"] --> B["GStreamer + OpenCV decode"]
  B --> C["Raw Mat frames"]
  C --> D["JPEG/WebP encoder"]
  D --> E["In-process fan-out"]
  E --> C1["Client A"]
  E --> C2["Client B"]
  E --> C3["Client C"]
  C1 --> R1["HTTP multipart image rendering"]
  C2 --> R2["HTTP multipart image rendering"]
  C3 --> R3["HTTP multipart image rendering"]

This path was easy to implement, but it introduced avoidable latency and bandwidth overhead:

  1. Continuous frame decode/encode work in the backend.
  2. Per-client delivery over multipart MJPEG.
  3. Browser <img> rendering path instead of real-time media transport.

New Architecture

I migrated video delivery to WebRTC with WHIP/WHEP:

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 60, "rankSpacing": 80}}}%%
flowchart TB
  subgraph VP["Video path"]
    A["RTSP"] --> B["Ingest + H264 prep"]
    B --> C["WHIP publisher"]
    C -->|WHIP| D["SRS SFU"]
    D -->|WHEP| E["Browser video"]
  end

Short terminology:

Implementation Notes

  1. Backend publishes PTZ and panorama streams to SRS via persistent WHIP sessions.
  2. Frontend uses WHEP endpoints for PTZ and panorama playback.

Why Latency and Bandwidth Improved

The biggest gains came from transport and delivery model changes:

  1. Replacing MJPEG-over-HTTP with WebRTC real-time media delivery.
  2. Avoiding repeated JPEG/WebP fan-out for each viewer.
  3. Letting the camera-configured encoded stream characteristics flow through more directly.

This reduced backend CPU load in my deployment and brought observed client latency down significantly.

Measurement Method

I measured glass-to-glass latency by recording:

  1. Camera OSD timestamp visible on stream output.
  2. time.is[1] clock visible on the same client display.
  3. Frame-by-frame delta between those two timestamps.

Results

ScenarioPipelineGlass-to-glass latencyNotes
Before migrationMJPEG (multipart/x-mixed-replace)>2sMVP architecture
After migrationWHIP/WHEP WebRTC via SRS0.63–0.77sCurrent architecture

Closing

Under a strict 2-week MVP constraint, MJPEG was the fastest way to ship. For production behavior, switching to WHIP/WHEP WebRTC was the key step to cut latency and bandwidth while keeping architecture manageable.

References

  1. time.is — web-based reference clock synchronized via NTP.
  2. SRS (Simple Realtime Server) — open-source, self-hosted media server / SFU.