server: add ping/pong echo capability

Adds a small server-side capability: when a DTLS payload starts with the 4-byte magic prefix 0xff 'P' 'N' 'G', the server echoes the entire packet back to the client through the same DTLS conn. Otherwise the packet is forwarded to WireGuard as before. Whether any client *uses* this echo is up to the client. The server just provides the capability. Backward compatibility: the echo branch is gated on the magic-prefix check. Without that prefix nothing fires — every existing client sees identical behaviour. New clients sending probes to an unpatched server see no echo and degrade gracefully (the bytes flow through to WG which drops them as message type 0xff, outside WG's 1..4 range). Use case that motivated this: detecting zombie TURN allocations — sessions where pion's Refresh and VK's NAT-keepalive Binding both succeed (control plane "healthy") but the actual data path through VK's relay is broken because the client's NAT mapping shifted after a network handover and VK's relay state is stale. Without an end-to-end signal the client can't tell. With ping/pong the client can periodically ping and tear down conns whose echoes have stopped arriving. Reference client implementation: anton48/vk-turn-proxy-ios commit 8c430f3 (141 lines in pkg/proxy/proxy.go, a refactored extraction of client/main.go). Empirically tuned over a month of production: 30s ping interval, 120s stale threshold, latched "server is echoing" flag so clients never kill conns when talking to an unpatched server. Cost: one byte-prefix comparison per inbound DTLS packet, no allocation, no parsing. When echoing, one DTLS write per ping. With 30 conns at 30s interval that's ~1 packet/sec total.
2 months ago · d1ed9c798a
1 changed files with 27 additions and 0 deletions
--- a/server/main.go
+++ b/server/main.go
@ -169,6 +169,33 @@ func handleUDPConnection(ctx context.Context, conn net.Conn, connectAddr string)
 				return
 			}

+			// Liveness-probe echo. Clients may send short sentinel
+			// packets (4-byte magic 0xff 'P' 'N' 'G' + payload)
+			// periodically over a DTLS conn to detect data-path
+			// failures that don't show up at the control plane —
+			// e.g. TURN allocations where Refresh/keepalive succeed
+			// but the relay's NAT mapping is stale and packets are
+			// silently dropped. We echo the bytes back through the
+			// same DTLS conn — pion's dtls.Conn is goroutine-safe so
+			// concurrent Write from this read-loop and the WG-receive
+			// loop below is fine. The first byte 0xff falls outside
+			// WireGuard's 1..4 message types, so without this branch
+			// the bytes would be forwarded to serverConn (WG) and
+			// silently dropped — which is also what unpatched servers
+			// do, hence clients sending probes to an unpatched server
+			// see no echo and degrade gracefully.
+			if n >= 4 && buf[0] == 0xff && buf[1] == 'P' && buf[2] == 'N' && buf[3] == 'G' {
+				if err1 := conn.SetWriteDeadline(time.Now().Add(5 * time.Second)); err1 != nil {
+					log.Printf("Failed to set probe-echo deadline: %s", err1)
+					return
+				}
+				if _, err1 := conn.Write(buf[:n]); err1 != nil {
+					log.Printf("Failed to echo probe: %s", err1)
+					return
+				}
+				continue
+			}
+
 			if err1 = serverConn.SetWriteDeadline(time.Now().Add(time.Minute * 30)); err1 != nil {
 				log.Printf("Failed: %s", err1)
 				return