Skip to content

WebSocket Proxy

Purpose

Distributed WebSocket proxy service managing outbound connections to external WebSocket servers with auto-reconnection, correlated request-response messaging, pub/sub message routing, and cross-instance ownership with orphan reclamation.

Requirements

Requirement: External WebSocket Connection Lifecycle

The WebSocket Manager SHALL allow clients to connect to arbitrary external WebSocket servers, disconnect from them, and list all managed connections. Connections SHALL persist through service restarts.

Scenario: Connect to external WebSocket

  • WHEN a client requests a connection to a WebSocket URL
  • THEN the manager SHALL create and maintain a ClientWebSocket connection, persist the connection metadata to the Dapr state store, and return a unique connection ID

Scenario: List all connections

  • WHEN a client lists connections
  • THEN the manager SHALL return all active connections with their IDs, URLs, status, and topics

Scenario: Disconnect

  • WHEN a client disconnects a connection
  • THEN the manager SHALL close the WebSocket, cancel its receive loop, and remove connection state from both local and Dapr stores

Requirement: Messaging Modes

The WebSocket Manager SHALL support two send modes: Raw (fire-and-forget) and correlated request-response via a 16-byte UUID binary prefix envelope. Both modes SHALL accept a content_type field that determines the WebSocket frame type (text/* or application/json → Text, else Binary).

Scenario: Raw message send

  • WHEN a client sends a raw message (SendRaw) to a connection with a content_type
  • THEN the message SHALL be sent as-is to the WebSocket without correlation tracking, using a Text frame if content_type starts with "text/" or equals "application/json", otherwise a Binary frame

Scenario: Correlated request-response

  • WHEN a client sends a message via Send (correlated mode) with a content_type
  • THEN the message SHALL be prefixed with a 16-byte UUID binary envelope [16 bytes UUID][payload bytes], sent using a Text frame if content_type starts with "text/" or equals "application/json", otherwise a Binary frame, and the first response whose first 16 bytes match the UUID SHALL be returned, with the UUID prefix stripped

Scenario: Correlated timeout

  • WHEN a correlated request does not receive a matching response within the configurable timeout
  • THEN a timeout error SHALL be returned to the caller

Requirement: Auto-Reconnection

Connections marked for auto-reconnect SHALL attempt reconnection with exponential backoff when the WebSocket drops.

Scenario: Connection drop with auto-reconnect

  • WHEN a connection with auto_reconnect=true loses its WebSocket
  • THEN the manager SHALL attempt reconnection up to a configurable maximum with exponential backoff (base delay doubling each attempt, capped at a maximum delay)

Scenario: Reconnection success

  • WHEN a reconnection attempt succeeds
  • THEN a new receive loop SHALL start and message delivery SHALL resume

Scenario: Reconnection exhausted

  • WHEN all reconnection attempts fail
  • THEN the connection SHALL transition to Disconnected status

Requirement: Pub/Sub Message Routing

Received WebSocket messages SHALL be forwardable to Dapr pub/sub topics. Each message SHALL include connection metadata in the published envelope.

Scenario: Start publishing

  • WHEN a client calls StartPublish on a connection with a topic name
  • THEN all subsequently received WebSocket messages SHALL be published to that Dapr topic

Scenario: Message envelope

  • WHEN a WebSocket message is published to Dapr
  • THEN the envelope SHALL contain WebSocketId, WebSocketUrl, Timestamp, Payload (bytes), and ContentType (auto-detected from the WebSocket frame type: Text → "text/plain", Binary → "application/octet-stream")

Scenario: Stop publishing

  • WHEN a client calls StopPublish on a connection
  • THEN message forwarding to Dapr SHALL stop

Requirement: Service Architecture

The WSM SHALL use the virtufin-api gRPC for all pubsub and state operations. The WSM SHALL NOT call DaprClient.PublishEventAsync, GetStateAsync, or any other Dapr pubsub/state API directly. The Dapr sidecar SHALL remain in use for service-invocation mTLS, distributed tracing, and metrics.

Scenario: Service publishes user-data event

  • WHEN the WSM forwards a received WebSocket message to a user-specified topic
  • THEN it SHALL call Pubsub.PublishEvent on the virtufin-api (regular, not the system variant)
  • AND it SHALL NOT call DaprClient.PublishEventAsync directly

Scenario: Service persists connection state

  • WHEN the WSM saves or retrieves connection metadata
  • THEN it SHALL call the API's State gRPC service
  • AND it SHALL NOT call any Dapr state API directly

Requirement: Connection Lifecycle Events

The WSM SHALL publish connection lifecycle events to the websocketmanager.connectionstatus topic on every state transition. Events SHALL be CloudEvents v1.0 envelopes.

Scenario: Initial connect

  • WHEN a user successfully connects to an upstream WebSocket server
  • THEN the WSM SHALL publish a connection.connected event with connection_id, url, instance_id in the data payload

Scenario: User-initiated disconnect

  • WHEN a user calls Disconnect on a connection
  • THEN the WSM SHALL publish a connection.disconnected event

Scenario: Server-initiated close

  • WHEN the upstream WebSocket server sends a Close frame
  • THEN the WSM SHALL publish connection.disconnected (if auto_reconnect=false) OR connection.reconnecting (if auto_reconnect=true), then connection.connected (on successful reconnect) OR connection.disconnected (on exhausted reconnect)

Scenario: Network error

  • WHEN the receive loop catches an unexpected exception
  • THEN the WSM SHALL publish a connection.error event with error_type and error_message in the data payload
  • AND the WSM SHALL follow the same path as server-initiated close (publish connection.reconnecting or connection.disconnected)

Scenario: Reconnecting

  • WHEN the WSM is attempting to reconnect after a drop
  • THEN the WSM SHALL publish a connection.reconnecting event with connection_id, url, instance_id, and attempt in the data payload

Scenario: Event ordering

  • WHEN multiple events fire for the same connection_id
  • THEN they SHALL be delivered in causal order: connection.error precedes its resulting state transition; connection.reconnecting precedes its outcome (connection.connected or connection.disconnected)

Requirement: Distributed Ownership

Each connection SHALL be owned by exactly one WebSocket Manager instance. Only the owning instance may manipulate a connection.

Scenario: Cross-instance access denied

  • WHEN a request targets a connection owned by a different instance
  • THEN the request SHALL fail with an ownership error

Requirement: Orphaned Connection Reclamation

When an owning instance dies, its connections SHALL be reclaimed by surviving instances after a grace period.

Scenario: Dead instance detection

  • WHEN a connection has been owned by an instance that is no longer known for longer than a configurable grace period
  • THEN a surviving instance SHALL reassign ownership and resume managing the connection

Scenario: Periodic reclamation sweep

  • WHEN the reclamation service runs on its configured interval
  • THEN it SHALL scan all connections in the state store and reclaim any belonging to unknown instances