Browse Source

server: add ping/pong echo capability

Adds a small server-side capability: when a DTLS payload starts with
the 4-byte magic prefix 0xff 'P' 'N' 'G', the server echoes the entire
packet back to the client through the same DTLS conn. Otherwise the
packet is forwarded to WireGuard as before.

Whether any client *uses* this echo is up to the client. The server
just provides the capability.

Backward compatibility: the echo branch is gated on the magic-prefix
check. Without that prefix nothing fires — every existing client sees
identical behaviour. New clients sending probes to an unpatched server
see no echo and degrade gracefully (the bytes flow through to WG which
drops them as message type 0xff, outside WG's 1..4 range).

Use case that motivated this: detecting zombie TURN allocations —
sessions where pion's Refresh and VK's NAT-keepalive Binding both
succeed (control plane "healthy") but the actual data path through VK's
relay is broken because the client's NAT mapping shifted after a
network handover and VK's relay state is stale. Without an end-to-end
signal the client can't tell. With ping/pong the client can periodically
ping and tear down conns whose echoes have stopped arriving.

Reference client implementation: anton48/vk-turn-proxy-ios commit
8c430f3 (141 lines in pkg/proxy/proxy.go, a refactored extraction of
client/main.go). Empirically tuned over a month of production: 30s
ping interval, 120s stale threshold, latched "server is echoing" flag
so clients never kill conns when talking to an unpatched server.

Cost: one byte-prefix comparison per inbound DTLS packet, no allocation,
no parsing. When echoing, one DTLS write per ping. With 30 conns at
30s interval that's ~1 packet/sec total.
pull/168/head
Anton Monakhov 2 months ago
parent
commit
d1ed9c798a
  1. 27
      server/main.go

27
server/main.go

@ -169,6 +169,33 @@ func handleUDPConnection(ctx context.Context, conn net.Conn, connectAddr string)
return
}
// Liveness-probe echo. Clients may send short sentinel
// packets (4-byte magic 0xff 'P' 'N' 'G' + payload)
// periodically over a DTLS conn to detect data-path
// failures that don't show up at the control plane —
// e.g. TURN allocations where Refresh/keepalive succeed
// but the relay's NAT mapping is stale and packets are
// silently dropped. We echo the bytes back through the
// same DTLS conn — pion's dtls.Conn is goroutine-safe so
// concurrent Write from this read-loop and the WG-receive
// loop below is fine. The first byte 0xff falls outside
// WireGuard's 1..4 message types, so without this branch
// the bytes would be forwarded to serverConn (WG) and
// silently dropped — which is also what unpatched servers
// do, hence clients sending probes to an unpatched server
// see no echo and degrade gracefully.
if n >= 4 && buf[0] == 0xff && buf[1] == 'P' && buf[2] == 'N' && buf[3] == 'G' {
if err1 := conn.SetWriteDeadline(time.Now().Add(5 * time.Second)); err1 != nil {
log.Printf("Failed to set probe-echo deadline: %s", err1)
return
}
if _, err1 := conn.Write(buf[:n]); err1 != nil {
log.Printf("Failed to echo probe: %s", err1)
return
}
continue
}
if err1 = serverConn.SetWriteDeadline(time.Now().Add(time.Minute * 30)); err1 != nil {
log.Printf("Failed: %s", err1)
return

Loading…
Cancel
Save