How I Achieved Subsecond Glass-to-Glass Latency
Background
I am building a real-time vision system with two live views:
- A stitched panorama from multiple fixed cameras.
- A live auto-tracking PTZ feed.
For the first MVP, I optimized for speed of implementation and used MJPEG streaming end to end. That helped me ship quickly, but during the MVP presentation users flagged two issues immediately:
- High bandwidth usage.
- More than 2 seconds glass-to-glass latency.
Latency here is measured from OSD timestamp versus wall-clock timestamp on the client display.
Hardware Topology
All components are connected over Tailscale. The capture side is behind a subnet router, and the uplink uses Starlink.
- Client -> Server -> PC (Tailscale subnet router) -> Cameras
- Server to source network ping: about
84ms - Client to server ping:
1ms
%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 55, "rankSpacing": 75}}}%%
flowchart TB
subgraph EDGE["Edge Location"]
CAM["IP Cameras"] -->|LAN| GATE["Edge PC / Subnet Gateway (192.168.1.0/24)"]
GATE --- SAT["Starlink (WAN uplink)"]
end
subgraph HQ["Server Location"]
SRV["Application Server (on-prem)"]
CLI["Client Device (Browser)"]
end
GATE -->|Tailscale ~84 ms| SRV["Application Server (on-prem)"]
SRV -->|Tailscale ~84 ms| GATE
SRV -->|Tailscale ~1 ms| CLI["Client Device (Browser)"]
CLI -->|Tailscale ~1 ms| SRV
Camera Configuration
For latency-sensitive operation, the camera stream was configured with the following encoder profile:
- Resolution:
2880x1620 - FPS:
7 - Encoding:
H.264 - I-frame interval:
50 - Max bitrate:
1024 kbps (variable)
Legacy Architecture (MVP)
My original pipeline was:
%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 50, "rankSpacing": 70}}}%%
flowchart TB
A["RTSP source"] --> B["GStreamer + OpenCV decode"]
B --> C["Raw Mat frames"]
C --> D["JPEG/WebP encoder"]
D --> E["In-process fan-out"]
E --> C1["Client A"]
E --> C2["Client B"]
E --> C3["Client C"]
C1 --> R1["HTTP multipart image rendering"]
C2 --> R2["HTTP multipart image rendering"]
C3 --> R3["HTTP multipart image rendering"]
This path was easy to implement, but it introduced avoidable latency and bandwidth overhead:
- Continuous frame decode/encode work in the backend.
- Per-client delivery over multipart MJPEG.
- Browser
<img>rendering path instead of real-time media transport.
New Architecture
I migrated video delivery to WebRTC with WHIP/WHEP:
%%{init: {"themeVariables": {"fontSize": "18px"}, "flowchart": {"useMaxWidth": false, "nodeSpacing": 60, "rankSpacing": 80}}}%%
flowchart TB
subgraph VP["Video path"]
A["RTSP"] --> B["Ingest + H264 prep"]
B --> C["WHIP publisher"]
C -->|WHIP| D["SRS SFU"]
D -->|WHEP| E["Browser video"]
end
Short terminology:
- WHIP: protocol for publishing a WebRTC stream to a media server over HTTP.
- WHEP: protocol for playing a WebRTC stream from a media server over HTTP.
- SRS[2]: media server/SFU that sits between publisher and viewers.
Implementation Notes
- Backend publishes PTZ and panorama streams to SRS via persistent WHIP sessions.
- Frontend uses WHEP endpoints for PTZ and panorama playback.
Why Latency and Bandwidth Improved
The biggest gains came from transport and delivery model changes:
- Replacing MJPEG-over-HTTP with WebRTC real-time media delivery.
- Avoiding repeated JPEG/WebP fan-out for each viewer.
- Letting the camera-configured encoded stream characteristics flow through more directly.
This reduced backend CPU load in my deployment and brought observed client latency down significantly.
Measurement Method
I measured glass-to-glass latency by recording:
- Camera OSD timestamp visible on stream output.
time.is[1] clock visible on the same client display.- Frame-by-frame delta between those two timestamps.
Results
| Scenario | Pipeline | Glass-to-glass latency | Notes |
|---|---|---|---|
| Before migration | MJPEG (multipart/x-mixed-replace) | >2s | MVP architecture |
| After migration | WHIP/WHEP WebRTC via SRS | 0.63–0.77s | Current architecture |
Closing
Under a strict 2-week MVP constraint, MJPEG was the fastest way to ship. For production behavior, switching to WHIP/WHEP WebRTC was the key step to cut latency and bandwidth while keeping architecture manageable.
References
- time.is — web-based reference clock synchronized via NTP. ↩
- SRS (Simple Realtime Server) — open-source, self-hosted media server / SFU. ↩