How I Achieved Subsecond Glass-to-Glass Latency

By Thomas Edwin Santosa March 15, 2026

Background

I am building a real-time vision system with two live views:

A stitched panorama from multiple fixed cameras.
A live auto-tracking PTZ feed.

For the first MVP, I optimized for speed of implementation and used MJPEG streaming end to end. That helped me ship quickly, but during the MVP presentation users flagged two issues immediately:

High bandwidth usage.
More than 2 seconds glass-to-glass latency.

Latency here is measured from OSD timestamp versus wall-clock timestamp on the client display.

Hardware Topology

All components are connected over Tailscale. The capture side is behind a subnet router, and the uplink uses Starlink.

Client -> Server -> PC (Tailscale subnet router) -> Cameras
Server to source network ping: about 84ms
Client to server ping: 1ms

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 55, "rankSpacing": 75}}}%%
flowchart TB
  subgraph EDGE["Edge Location"]
    CAM["IP Cameras"] -->|LAN| GATE["Edge PC / Subnet Gateway (192.168.1.0/24)"]
    GATE --- SAT["Starlink (WAN uplink)"]
  end
  subgraph HQ["Server Location"]
    SRV["Application Server (on-prem)"]
    CLI["Client Device (Browser)"]
  end
  GATE -->|Tailscale ~84 ms| SRV["Application Server (on-prem)"]
  SRV -->|Tailscale ~84 ms| GATE
  SRV -->|Tailscale ~1 ms| CLI["Client Device (Browser)"]
  CLI -->|Tailscale ~1 ms| SRV

Camera Configuration

For latency-sensitive operation, the camera stream was configured with the following encoder profile:

Resolution: 2880x1620
FPS: 7
Encoding: H.264
I-frame interval: 50
Max bitrate: 1024 kbps (variable)

Legacy Architecture (MVP)

My original pipeline was:

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 50, "rankSpacing": 70}}}%%
flowchart TB
  A["RTSP source"] --> B["GStreamer + OpenCV decode"]
  B --> C["Raw Mat frames"]
  C --> D["JPEG/WebP encoder"]
  D --> E["In-process fan-out"]
  E --> C1["Client A"]
  E --> C2["Client B"]
  E --> C3["Client C"]
  C1 --> R1["HTTP multipart image rendering"]
  C2 --> R2["HTTP multipart image rendering"]
  C3 --> R3["HTTP multipart image rendering"]

This path was easy to implement, but it introduced avoidable latency and bandwidth overhead:

Continuous frame decode/encode work in the backend.
Per-client delivery over multipart MJPEG.
Browser <img> rendering path instead of real-time media transport.

New Architecture

I migrated video delivery to WebRTC with WHIP/WHEP:

%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 60, "rankSpacing": 80}}}%%
flowchart TB
  subgraph VP["Video path"]
    A["RTSP"] --> B["Ingest + H264 prep"]
    B --> C["WHIP publisher"]
    C -->|WHIP| D["SRS SFU"]
    D -->|WHEP| E["Browser video"]
  end

Short terminology:

WHIP: protocol for publishing a WebRTC stream to a media server over HTTP.
WHEP: protocol for playing a WebRTC stream from a media server over HTTP.
SRS^[2]: media server/SFU that sits between publisher and viewers.

Implementation Notes

Backend publishes PTZ and panorama streams to SRS via persistent WHIP sessions.
Frontend uses WHEP endpoints for PTZ and panorama playback.

Why Latency and Bandwidth Improved

The biggest gains came from transport and delivery model changes:

Replacing MJPEG-over-HTTP with WebRTC real-time media delivery.
Avoiding repeated JPEG/WebP fan-out for each viewer.
Letting the camera-configured encoded stream characteristics flow through more directly.

This reduced backend CPU load in my deployment and brought observed client latency down significantly.

Measurement Method

I measured glass-to-glass latency by recording:

Camera OSD timestamp visible on stream output.
time.is^[1] clock visible on the same client display.
Frame-by-frame delta between those two timestamps.

Results

Scenario	Pipeline	Glass-to-glass latency	Notes
Before migration	MJPEG (`multipart/x-mixed-replace`)	>2s	MVP architecture
After migration	WHIP/WHEP WebRTC via SRS	0.63–0.77s	Current architecture

Closing

Under a strict 2-week MVP constraint, MJPEG was the fastest way to ship. For production behavior, switching to WHIP/WHEP WebRTC was the key step to cut latency and bandwidth while keeping architecture manageable.

References

time.is — web-based reference clock synchronized via NTP. ↩
SRS (Simple Realtime Server) — open-source, self-hosted media server / SFU. ↩