Cross-cutting
Purpose¶
Shared infrastructure, security, observability, and resilience requirements that apply across all Virtufin services. Every service SHALL conform to these requirements.
Topic and state-key naming conventions live in pubsub-topics/spec.md; scenario registry conventions live in scenarios/spec.md. This spec covers the access patterns (API-mediated access, ownership) and the non-naming cross-cutting concerns.
Requirements¶
Requirement: API-Mediated Pub/Sub and State¶
Services MUST NOT call DaprClient.PublishEventAsync, DaprClient.GetStateAsync, DaprClient.SaveStateAsync, DaprClient.DeleteStateAsync, GetBulkStateAsync, or any other Dapr pubsub or state API directly. Services MUST use the virtufin-api's Pubsub and State gRPC services for all pubsub and state operations.
The virtufin-api is the only service that talks to Dapr pubsub/state on behalf of other services. This gives the API a single place to enforce topic naming, access control, observability, and state ownership.
The virtufin-api also exposes scenario-registry RPCs (create / pause / archive / delete) that read and write scenario.<scenarioId> state keys per scenarios/spec.md.
Scenario: Service publishes an event¶
- WHEN a service needs to publish an event
- THEN it SHALL call
Pubsub.PublishEventon the virtufin-api (gRPC) - AND it SHALL NOT call
DaprClient.PublishEventAsyncdirectly
Scenario: Service reads or writes state¶
- WHEN a service needs to read or write persistent state
- THEN it SHALL call the API's
StategRPC service (SaveState, GetState, GetAllState, RegisterKeys, DeleteState) - AND it SHALL NOT call any Dapr state API directly
Scenario: Dapr sidecar usage¶
- WHEN a service uses Dapr for service-invocation, mTLS, distributed tracing, or metrics
- THEN the Dapr sidecar is the right path; pubsub and state go through the API
- The sidecar stays even after the API-mediates pubsub/state; the two concerns are independent
Requirement: Topic Naming (delegated to pubsub-topics spec)¶
Topic patterns, CloudEvents envelope contract, scenario ID conventions, NATS permissions, state-key taxonomy, and migration rules are defined in pubsub-topics/spec.md. This spec does NOT duplicate those rules; services SHALL conform to the patterns in that spec.
Scenario: Service publishes a domain event¶
- WHEN a service publishes an event under any of the namespaces defined in pubsub-topics/spec.md (Tier 0 infrastructure, market data lane, scenario lane)
- THEN it SHALL call
Pubsub.PublishEventwith the topic matching the pattern for that namespace - AND the API SHALL reject publishes that violate the topic validation rules in the spec
Scenario: Service publishes a scenario-scoped event¶
- WHEN a service publishes to
sc.<scenarioId>.<domain>.* - THEN the API SHALL validate that
scenarioIdis registered (or is the reservedLIVE) per scenarios/spec.md - AND reject publishes to unregistered or non-active scenarios
Scenario: Subscriber consumes another service's events¶
- WHEN a service needs to consume another service's events
- THEN it SHALL call
Pubsub.Subscribewith the topic string duplicated locally - AND the local constant SHALL carry a comment cross-linking to the publisher's
Configuration/Topics.cssource of truth (or scenario registry entry for scenario-scoped topics)
Scenario: Topic name is a public contract¶
- WHEN a publisher changes its topic name
- THEN it is a breaking change for every subscriber; the constant is part of the service's public contract
Requirement: Per-service State Service Names¶
Each service uses its own state service entry (the entry in the API's services.json matching the service's name). Callers use the regular State.* RPCs with the service field set to the per-service name (websocketmanager, workmanager). There are no State.*System* RPC variants.
Scenario: Service reads or writes its own state¶
- WHEN a service needs to read or write its own persistent state
- THEN it SHALL call the regular
State.*RPCs withservice: "<service-name>" - AND it SHALL NOT call any
State.*System*variant (none exist)
Requirement: Lifecycle Events as CloudEvents v1.0¶
Services that publish lifecycle events MUST format them as CloudEvents v1.0 envelopes. The CloudEvents attributes (ce-id, ce-source, ce-type, ce-time, ce-specversion, ce-datacontenttype) MUST be carried in the request metadata field, prefixed with ce-. The CloudEvents data field MUST be the request's data field.
Scenario: Connection lifecycle event¶
- WHEN a service publishes a connection lifecycle event (e.g., a WebSocket connection is established, disconnects, or fails)
- THEN it SHALL set
ce-typetocom.virtufin.<service>.connection.<state> - AND
ce-sourceto its own service URN (e.g.,urn:com.virtufin.websocketmanager) - AND the data payload to a JSON object with
connection_id,url,instance_id(and any state-specific fields)
Scenario: Worker lifecycle event¶
- WHEN a service publishes a worker lifecycle event (created, started, stopped, error)
- THEN it SHALL set
ce-typetocom.virtufin.workmanager.worker.<state> - AND
ce-sourcetourn:com.virtufin.workmanager - AND the data payload to a JSON object with
worker_id,group,topic(anderror_type/error_messagefor error events)
Requirement: API Endpoint Configuration¶
Services that call the virtufin-api MUST accept the API's gRPC endpoint via configuration. The endpoint SHALL be configurable via environment variables (VIRTUFIN_API_HOST, VIRTUFIN_API_GRPC_PORT) with sensible defaults (localhost:5002).
Scenario: Production deployment¶
- WHEN a service is deployed to a cluster
- THEN
VIRTUFIN_API_HOSTandVIRTUFIN_API_GRPC_PORTSHALL be set to the cluster-internal address of the virtufin-api service
Scenario: Local development¶
- WHEN a service runs locally
- THEN the default
localhost:5002is used unless overridden
Requirement: Observability¶
Every service SHALL emit OpenTelemetry traces and metrics. Health checks SHALL gate readiness until the service is fully initialized.
Scenario: Distributed tracing¶
- WHEN a request is processed across multiple services
- THEN each span SHALL be exported to the configured OpenTelemetry collector with correlation context
Scenario: Liveness and readiness¶
- WHEN a service starts up and before it recovers its persistent state
- THEN the readiness health check SHALL fail, preventing traffic routing
Scenario: Startup recovery¶
- WHEN a service restarts after a crash
- THEN it SHALL restore its persisted state before marking itself healthy
Requirement: Resilience¶
All Dapr operations and external calls SHALL use retry with exponential backoff and circuit breaking.
Scenario: Transient Dapr failure¶
- WHEN a Dapr API call fails with a transient error
- THEN the operation SHALL be retried up to 3 times with exponential backoff before failing
Scenario: Persistent failure¶
- WHEN Dapr API calls consistently fail beyond a threshold
- THEN the circuit breaker SHALL open, failing fast for subsequent calls until the break duration elapses
Requirement: Security¶
All service-to-service communication SHALL be encrypted. User-supplied code execution SHALL be sandboxed. URL-based code fetching SHALL prevent SSRF.
Scenario: Service-to-service communication¶
- WHEN one service communicates with another
- THEN Dapr mTLS SHALL encrypt the connection
Scenario: SSRF prevention¶
- WHEN fetching code from a user-supplied URL
- THEN the service SHALL reject URLs targeting private IP ranges (loopback, RFC1918, link-local) unless the host is in an explicit allowlist
Requirement: Native AOT Compilation¶
Services targeting AOT-capable runtimes SHALL compile to native code. All JSON serialization SHALL use source-generated contexts.
Scenario: Build and deployment¶
- WHEN a service is built for production
- THEN it SHALL compile to a self-contained native binary with trimmed dependencies
Requirement: Configuration¶
Every service SHALL support configuration via command-line arguments, environment variables, and configuration files. Ports SHALL be configurable. Dapr component names SHALL be configurable.
Scenario: Port override¶
- WHEN the
HttpPortenvironment variable or--http-portargument is set - THEN the service SHALL listen on that port instead of the default
Scenario: Dapr component override¶
- WHEN
pubsubNameis configured - THEN the service SHALL use the named Dapr pubsub component for all pub/sub operations
Requirement: Containerization¶
Every service SHALL be containerizable as a minimal Docker image based on the chiseled runtime image.
Scenario: Docker build¶
- WHEN a service's Dockerfile is built
- THEN the resulting image SHALL contain only the native AOT binary and its runtime dependencies on a
runtime-depsbase