WebSocket Proxy
Purpose¶
Distributed WebSocket proxy service managing outbound connections to external WebSocket servers with auto-reconnection, correlated request-response messaging, pub/sub message routing, and cross-instance ownership with orphan reclamation.
Requirements¶
Requirement: External WebSocket Connection Lifecycle¶
The WebSocket Manager SHALL allow clients to connect to arbitrary external WebSocket servers, disconnect from them, and list all managed connections. Connections SHALL persist through service restarts.
Scenario: Connect to external WebSocket¶
- WHEN a client requests a connection to a WebSocket URL
- THEN the manager SHALL create and maintain a ClientWebSocket connection, persist the connection metadata to the Dapr state store, and return a unique connection ID
Scenario: List all connections¶
- WHEN a client lists connections
- THEN the manager SHALL return all active connections with their IDs, URLs, status, and topics
Scenario: Disconnect¶
- WHEN a client disconnects a connection
- THEN the manager SHALL close the WebSocket, cancel its receive loop, and remove connection state from both local and Dapr stores
Requirement: Messaging Modes¶
The WebSocket Manager SHALL support two send modes: Raw (fire-and-forget) and correlated request-response via a 16-byte UUID binary prefix envelope. Both modes SHALL accept a content_type field that determines the WebSocket frame type (text/* or application/json → Text, else Binary).
Scenario: Raw message send¶
- WHEN a client sends a raw message (
SendRaw) to a connection with acontent_type - THEN the message SHALL be sent as-is to the WebSocket without correlation tracking, using a Text frame if content_type starts with "text/" or equals "application/json", otherwise a Binary frame
Scenario: Correlated request-response¶
- WHEN a client sends a message via
Send(correlated mode) with acontent_type - THEN the message SHALL be prefixed with a 16-byte UUID binary envelope
[16 bytes UUID][payload bytes], sent using a Text frame if content_type starts with "text/" or equals "application/json", otherwise a Binary frame, and the first response whose first 16 bytes match the UUID SHALL be returned, with the UUID prefix stripped
Scenario: Correlated timeout¶
- WHEN a correlated request does not receive a matching response within the configurable timeout
- THEN a timeout error SHALL be returned to the caller
Requirement: Auto-Reconnection¶
Connections marked for auto-reconnect SHALL attempt reconnection with exponential backoff when the WebSocket drops.
Scenario: Connection drop with auto-reconnect¶
- WHEN a connection with
auto_reconnect=trueloses its WebSocket - THEN the manager SHALL attempt reconnection up to a configurable maximum with exponential backoff (base delay doubling each attempt, capped at a maximum delay)
Scenario: Reconnection success¶
- WHEN a reconnection attempt succeeds
- THEN a new receive loop SHALL start and message delivery SHALL resume
Scenario: Reconnection exhausted¶
- WHEN all reconnection attempts fail
- THEN the connection SHALL transition to Disconnected status
Requirement: Pub/Sub Message Routing¶
Received WebSocket messages SHALL be forwardable to Dapr pub/sub topics. Each message SHALL include connection metadata in the published envelope.
Scenario: Start publishing¶
- WHEN a client calls
StartPublishon a connection with a topic name - THEN all subsequently received WebSocket messages SHALL be published to that Dapr topic
Scenario: Message envelope¶
- WHEN a WebSocket message is published to Dapr
- THEN the envelope SHALL contain
WebSocketId,WebSocketUrl,Timestamp,Payload(bytes), andContentType(auto-detected from the WebSocket frame type: Text → "text/plain", Binary → "application/octet-stream")
Scenario: Stop publishing¶
- WHEN a client calls
StopPublishon a connection - THEN message forwarding to Dapr SHALL stop
Requirement: Service Architecture¶
The WSM SHALL use the virtufin-api gRPC for all pubsub and state operations. The WSM SHALL NOT call DaprClient.PublishEventAsync, GetStateAsync, or any other Dapr pubsub/state API directly. The Dapr sidecar SHALL remain in use for service-invocation mTLS, distributed tracing, and metrics.
Scenario: Service publishes user-data event¶
- WHEN the WSM forwards a received WebSocket message to a user-specified topic
- THEN it SHALL call
Pubsub.PublishEventon the virtufin-api (regular, not the system variant) - AND it SHALL NOT call
DaprClient.PublishEventAsyncdirectly
Scenario: Service persists connection state¶
- WHEN the WSM saves or retrieves connection metadata
- THEN it SHALL call the API's
StategRPC service - AND it SHALL NOT call any Dapr state API directly
Requirement: Connection Lifecycle Events¶
The WSM SHALL publish connection lifecycle events to the websocketmanager.connectionstatus topic on every state transition. Events SHALL be CloudEvents v1.0 envelopes.
Scenario: Initial connect¶
- WHEN a user successfully connects to an upstream WebSocket server
- THEN the WSM SHALL publish a
connection.connectedevent withconnection_id,url,instance_idin the data payload
Scenario: User-initiated disconnect¶
- WHEN a user calls
Disconnecton a connection - THEN the WSM SHALL publish a
connection.disconnectedevent
Scenario: Server-initiated close¶
- WHEN the upstream WebSocket server sends a Close frame
- THEN the WSM SHALL publish
connection.disconnected(ifauto_reconnect=false) ORconnection.reconnecting(ifauto_reconnect=true), thenconnection.connected(on successful reconnect) ORconnection.disconnected(on exhausted reconnect)
Scenario: Network error¶
- WHEN the receive loop catches an unexpected exception
- THEN the WSM SHALL publish a
connection.errorevent witherror_typeanderror_messagein the data payload - AND the WSM SHALL follow the same path as server-initiated close (publish
connection.reconnectingorconnection.disconnected)
Scenario: Reconnecting¶
- WHEN the WSM is attempting to reconnect after a drop
- THEN the WSM SHALL publish a
connection.reconnectingevent withconnection_id,url,instance_id, andattemptin the data payload
Scenario: Event ordering¶
- WHEN multiple events fire for the same
connection_id - THEN they SHALL be delivered in causal order:
connection.errorprecedes its resulting state transition;connection.reconnectingprecedes its outcome (connection.connectedorconnection.disconnected)
Requirement: Distributed Ownership¶
Each connection SHALL be owned by exactly one WebSocket Manager instance. Only the owning instance may manipulate a connection.
Scenario: Cross-instance access denied¶
- WHEN a request targets a connection owned by a different instance
- THEN the request SHALL fail with an ownership error
Requirement: Orphaned Connection Reclamation¶
When an owning instance dies, its connections SHALL be reclaimed by surviving instances after a grace period.
Scenario: Dead instance detection¶
- WHEN a connection has been owned by an instance that is no longer known for longer than a configurable grace period
- THEN a surviving instance SHALL reassign ownership and resume managing the connection
Scenario: Periodic reclamation sweep¶
- WHEN the reclamation service runs on its configured interval
- THEN it SHALL scan all connections in the state store and reclaim any belonging to unknown instances