This page is the canonical observation architecture for the Paperclip control plane. Use it when you need to understand how runtime events become logs, traces, provenance, warehouse facts, and operator-visible health signals. The observation layer is not one backend. It is a bounded set of signal families with different durability, ownership, and failure modes:
  • low-cardinality telemetry for server and runtime lifecycle
  • run-level observability summaries derived from heartbeat execution
  • optional Langfuse traces and scores
  • provenance and output lineage attached to runs and artifacts
  • warehouse mirroring into ClickHouse for analytical and monitoring reads
  • readiness and operator health surfaces exposed through the API and runtime-service monitoring

Source Of Truth Rules

The observation layer follows a strict authority order:
SurfaceCanonical forNot canonical for
PostgreSQL via packages/dbfirst-party run state, costs, finance, evaluation, provenance, file writes, output artifacts, runtime-service healthhigh-volume warehouse marts, third-party trace storage
ClickHouse mirrorderived event streams, warehouse rollups, monitoring martsfirst-party business entity truth
Langfuseoptional trace/span export and scoring viewscontrol-plane run truth or audit authority
Server logger / local runtime logslocal event visibility and debuggingdurable external telemetry by default
The rule is simple: Postgres owns the control-plane facts, ClickHouse mirrors selected analytical facts, and Langfuse is an optional trace sink rather than the system of record.

Observation Families

1. Telemetry

Primary implementation:
  • server/src/services/telemetry.ts
What it does:
  • emits low-cardinality server and runtime telemetry through the server logger
  • installs the DB hardening telemetry sink
  • provides lightweight signal points without pretending to be a full metrics platform
Operational meaning:
  • useful for local diagnosis and runtime identity
  • incomplete unless read alongside runtime mode, DB target, and company dataset identity

2. Run Observability

Primary implementation:
  • server/src/services/heartbeat-observability.ts
What it does:
  • derives work class, evidence, validation state, promotion state, and TQC-style execution fields from heartbeat runs
  • creates the summary layer that operators actually consume when judging execution health
Operational meaning:
  • this is the semantic observation layer for runs
  • these summaries are only as strong as the runtime evidence written during execution

3. Traces

Primary implementation:
  • server/src/services/langfuse-tracing.ts
What it does:
  • exports spans and scores to Langfuse when Langfuse is configured
  • links evaluation flows and execution traces to a trace ID where supported
Operational meaning:
  • traces exist only when config enables them
  • trace code presence is not proof that a trace exists in the current runtime

4. Provenance

Primary implementation:
  • server/src/services/run-provenance.ts
  • packages/db/src/extract-provenance-ledger.ts
What it does:
  • records output artifacts and observed file writes
  • persists lineage in run_file_writes and run_output_artifacts
  • supports extracted provenance ledgers and reconciliation work
Operational meaning:
  • provenance is part of auditability, not just debugging
  • evidence class matters: authoritative_native is stronger than derived_from_runtime_telemetry

5. Warehouse Mirroring

Primary implementation:
  • server/src/services/clickhouse.ts
  • server/src/services/intelligence-monitor.ts
What it does:
  • mirrors selected first-party events into ClickHouse
  • currently covers heartbeat_run_events, cost_events, finance_events, and performance_ledger
  • powers warehouse reads, freshness checks, and monitoring summaries
Operational meaning:
  • ClickHouse is a derived analytical surface
  • absence or staleness here does not erase the canonical Postgres record, but it does degrade operator visibility

6. Health And Readiness

Primary implementation:
  • server/src/services/health-checks.ts
  • server/src/routes/health.ts
What it does:
  • checks DB connectivity, migration visibility, and company presence at startup
  • exposes /api/health and /api/health/migration-status
  • uses canonical migration inspection rather than raw probing of the Drizzle journal table
Operational meaning:
  • these routes are readiness surfaces
  • they do not prove correct DB identity, correct tenant selection, or complete observation coverage

End-To-End Signal Flow

The observation path for a heartbeat run is: Read this flow in order:
  1. a runtime executes work and emits API-visible evidence
  2. the API persists first-party run, cost, finance, evaluation, and provenance state into Postgres
  3. run observability derives operator-facing execution classifications from those facts
  4. selected events mirror into ClickHouse for analytical and monitoring reads
  5. traces export to Langfuse when configured
  6. health routes and runtime-service records expose readiness and freshness signals to operators

Operator Surfaces

Primary operator-facing surfaces today:
  • /api/health
  • /api/health/migration-status
  • runtime-service health persisted in workspace runtime-service tables
  • sidebar badges for failures, approvals, and queue pressure
  • Tremor operating wiki pages such as /companies/tremor/wiki/observability
  • intelligence overlay tools when enabled, including Langfuse and ClickHouse-backed monitors
Use them with the right expectations:
  • /api/health answers “is this runtime up enough to respond?”
  • /api/health/migration-status answers “what does canonical migration inspection think?”
  • warehouse tools answer “what does the mirrored analytical surface show?”
  • provenance answers “what evidence exists for this run or output?”
None of these alone answer “is the whole system semantically correct?”

Configuration Gates

Observation coverage is intentionally conditional in several places.
SurfaceGateEffect when absent
Langfuse tracesLangfuse configurationspans and scores are not exported
ClickHouse mirrorClickHouse configuration and freshnessanalytical mirror is absent or degraded
External telemetry sinksenvironment-specific logging/export wiringlogs stay local to runtime logger output
Intelligence overlay toolsoverlay services running and reachablevendor dashboards are unavailable even if Postgres state exists
This means a healthy server can still have an incomplete observation surface.

Failure Interpretation Rules

Use these rules before escalating:
  • If /api/health is green but the dataset looks wrong, verify DB identity and company context before debugging features.
  • If ClickHouse is stale, treat warehouse views as degraded but verify whether Postgres still holds the canonical run facts.
  • If Langfuse is empty, check config before assuming trace-generation code failed.
  • If run classifications look incomplete, inspect whether the runtime actually wrote the expected evidence.
  • If provenance is missing, distinguish “nothing was recorded” from “recording path was disabled or bypassed.”

Current Blind Spots

The current architecture still has known limitations:
  • low-cardinality telemetry defaults to local logger output rather than guaranteed external delivery
  • Langfuse and ClickHouse remain configuration-conditional, so absence silently reduces visibility
  • readiness health is narrower than semantic correctness
  • run-derived summaries can lag or under-express failures when the runtime never wrote the expected evidence

Companion Docs

Use these pages adjacent to this one:
NeedStart here
Whole-workspace system mapArchitecture
Entity ownership and storage boundaryData Model
Health endpoints and route contractHealth API
Intelligence overlay services and ingressRuntime Services
Live Tremor operator viewObservability Wiki