Skip to content

Cross-cutting

Purpose

Shared infrastructure, security, observability, and resilience requirements that apply across all Virtufin services. Every service SHALL conform to these requirements.

Topic and state-key naming conventions live in pubsub-topics/spec.md; scenario registry conventions live in scenarios/spec.md. This spec covers the access patterns (API-mediated access, ownership) and the non-naming cross-cutting concerns.

Requirements

Requirement: API-Mediated Pub/Sub and State

Services MUST NOT call DaprClient.PublishEventAsync, DaprClient.GetStateAsync, DaprClient.SaveStateAsync, DaprClient.DeleteStateAsync, GetBulkStateAsync, or any other Dapr pubsub or state API directly. Services MUST use the virtufin-api's Pubsub and State gRPC services for all pubsub and state operations.

The virtufin-api is the only service that talks to Dapr pubsub/state on behalf of other services. This gives the API a single place to enforce topic naming, access control, observability, and state ownership.

The virtufin-api also exposes scenario-registry RPCs (create / pause / archive / delete) that read and write scenario.<scenarioId> state keys per scenarios/spec.md.

Scenario: Service publishes an event

  • WHEN a service needs to publish an event
  • THEN it SHALL call Pubsub.PublishEvent on the virtufin-api (gRPC)
  • AND it SHALL NOT call DaprClient.PublishEventAsync directly

Scenario: Service reads or writes state

  • WHEN a service needs to read or write persistent state
  • THEN it SHALL call the API's State gRPC service (SaveState, GetState, GetAllState, RegisterKeys, DeleteState)
  • AND it SHALL NOT call any Dapr state API directly

Scenario: Dapr sidecar usage

  • WHEN a service uses Dapr for service-invocation, mTLS, distributed tracing, or metrics
  • THEN the Dapr sidecar is the right path; pubsub and state go through the API
  • The sidecar stays even after the API-mediates pubsub/state; the two concerns are independent

Requirement: Topic Naming (delegated to pubsub-topics spec)

Topic patterns, CloudEvents envelope contract, scenario ID conventions, NATS permissions, state-key taxonomy, and migration rules are defined in pubsub-topics/spec.md. This spec does NOT duplicate those rules; services SHALL conform to the patterns in that spec.

Scenario: Service publishes a domain event

  • WHEN a service publishes an event under any of the namespaces defined in pubsub-topics/spec.md (Tier 0 infrastructure, market data lane, scenario lane)
  • THEN it SHALL call Pubsub.PublishEvent with the topic matching the pattern for that namespace
  • AND the API SHALL reject publishes that violate the topic validation rules in the spec

Scenario: Service publishes a scenario-scoped event

  • WHEN a service publishes to sc.<scenarioId>.<domain>.*
  • THEN the API SHALL validate that scenarioId is registered (or is the reserved LIVE) per scenarios/spec.md
  • AND reject publishes to unregistered or non-active scenarios

Scenario: Subscriber consumes another service's events

  • WHEN a service needs to consume another service's events
  • THEN it SHALL call Pubsub.Subscribe with the topic string duplicated locally
  • AND the local constant SHALL carry a comment cross-linking to the publisher's Configuration/Topics.cs source of truth (or scenario registry entry for scenario-scoped topics)

Scenario: Topic name is a public contract

  • WHEN a publisher changes its topic name
  • THEN it is a breaking change for every subscriber; the constant is part of the service's public contract

Requirement: Per-service State Service Names

Each service uses its own state service entry (the entry in the API's services.json matching the service's name). Callers use the regular State.* RPCs with the service field set to the per-service name (websocketmanager, workmanager). There are no State.*System* RPC variants.

Scenario: Service reads or writes its own state

  • WHEN a service needs to read or write its own persistent state
  • THEN it SHALL call the regular State.* RPCs with service: "<service-name>"
  • AND it SHALL NOT call any State.*System* variant (none exist)

Requirement: Lifecycle Events as CloudEvents v1.0

Services that publish lifecycle events MUST format them as CloudEvents v1.0 envelopes. The CloudEvents attributes (ce-id, ce-source, ce-type, ce-time, ce-specversion, ce-datacontenttype) MUST be carried in the request metadata field, prefixed with ce-. The CloudEvents data field MUST be the request's data field.

Scenario: Connection lifecycle event

  • WHEN a service publishes a connection lifecycle event (e.g., a WebSocket connection is established, disconnects, or fails)
  • THEN it SHALL set ce-type to com.virtufin.<service>.connection.<state>
  • AND ce-source to its own service URN (e.g., urn:com.virtufin.websocketmanager)
  • AND the data payload to a JSON object with connection_id, url, instance_id (and any state-specific fields)

Scenario: Worker lifecycle event

  • WHEN a service publishes a worker lifecycle event (created, started, stopped, error)
  • THEN it SHALL set ce-type to com.virtufin.workmanager.worker.<state>
  • AND ce-source to urn:com.virtufin.workmanager
  • AND the data payload to a JSON object with worker_id, group, topic (and error_type / error_message for error events)

Requirement: API Endpoint Configuration

Services that call the virtufin-api MUST accept the API's gRPC endpoint via configuration. The endpoint SHALL be configurable via environment variables (VIRTUFIN_API_HOST, VIRTUFIN_API_GRPC_PORT) with sensible defaults (localhost:5002).

Scenario: Production deployment

  • WHEN a service is deployed to a cluster
  • THEN VIRTUFIN_API_HOST and VIRTUFIN_API_GRPC_PORT SHALL be set to the cluster-internal address of the virtufin-api service

Scenario: Local development

  • WHEN a service runs locally
  • THEN the default localhost:5002 is used unless overridden

Requirement: Observability

Every service SHALL emit OpenTelemetry traces and metrics. Health checks SHALL gate readiness until the service is fully initialized.

Scenario: Distributed tracing

  • WHEN a request is processed across multiple services
  • THEN each span SHALL be exported to the configured OpenTelemetry collector with correlation context

Scenario: Liveness and readiness

  • WHEN a service starts up and before it recovers its persistent state
  • THEN the readiness health check SHALL fail, preventing traffic routing

Scenario: Startup recovery

  • WHEN a service restarts after a crash
  • THEN it SHALL restore its persisted state before marking itself healthy

Requirement: Resilience

All Dapr operations and external calls SHALL use retry with exponential backoff and circuit breaking.

Scenario: Transient Dapr failure

  • WHEN a Dapr API call fails with a transient error
  • THEN the operation SHALL be retried up to 3 times with exponential backoff before failing

Scenario: Persistent failure

  • WHEN Dapr API calls consistently fail beyond a threshold
  • THEN the circuit breaker SHALL open, failing fast for subsequent calls until the break duration elapses

Requirement: Security

All service-to-service communication SHALL be encrypted. User-supplied code execution SHALL be sandboxed. URL-based code fetching SHALL prevent SSRF.

Scenario: Service-to-service communication

  • WHEN one service communicates with another
  • THEN Dapr mTLS SHALL encrypt the connection

Scenario: SSRF prevention

  • WHEN fetching code from a user-supplied URL
  • THEN the service SHALL reject URLs targeting private IP ranges (loopback, RFC1918, link-local) unless the host is in an explicit allowlist

Requirement: Native AOT Compilation

Services targeting AOT-capable runtimes SHALL compile to native code. All JSON serialization SHALL use source-generated contexts.

Scenario: Build and deployment

  • WHEN a service is built for production
  • THEN it SHALL compile to a self-contained native binary with trimmed dependencies

Requirement: Configuration

Every service SHALL support configuration via command-line arguments, environment variables, and configuration files. Ports SHALL be configurable. Dapr component names SHALL be configurable.

Scenario: Port override

  • WHEN the HttpPort environment variable or --http-port argument is set
  • THEN the service SHALL listen on that port instead of the default

Scenario: Dapr component override

  • WHEN pubsubName is configured
  • THEN the service SHALL use the named Dapr pubsub component for all pub/sub operations

Requirement: Containerization

Every service SHALL be containerizable as a minimal Docker image based on the chiseled runtime image.

Scenario: Docker build

  • WHEN a service's Dockerfile is built
  • THEN the resulting image SHALL contain only the native AOT binary and its runtime dependencies on a runtime-deps base