Operational Data Loss Investigation Context

Status: Active investigation handoff
Owner: Codex / Paperclip local instance
Date: 2026-04-10
Target model: Gemini Pro
Scope: Read-only forensic investigation of missing operational/work telemetry in the live Tremor/Paperclip instance

Executive Summary

The live Paperclip instance is not crashed. The core company graph, issues, agents, and projects exist again, but the operational telemetry layer is empty in the active DB. Confirmed missing/empty in the live DB:
  • cost_events
  • finance_events
  • heartbeat_runs
  • issue_inbox_archives
  • issue_work_products
  • plugin_job_runs
  • plugin_logs
  • routine_runs
  • workspace_operations
  • approvals
  • budget_incidents
  • most other operational history tables
Confirmed present:
  • agents
  • issues
  • projects
  • a small number of activity_log rows
  • a couple of performance_signals rows
  • a couple of performance_snapshots rows
The visible user-facing effect is that the app shows zero cost, zero run history, empty inbox, and sparse operational dashboards. The issue is a data-loss / restore-scope problem, not a UI rendering bug.

What Happened in This Chat Thread

This is the condensed history of the work that led to the current state.

1. Initial focus: removing Google Antigravity from the Mac

The conversation began with a request to completely remove Google Antigravity from macOS. A script was created and hardened for deep cleanup, then run and verified. Additional historical traces were removed from shell profiles, keychain, and crash metadata.

2. Agent Skills / internalCtx / skill activation fixes

The next topic was the Agent Skills plugin. The activation path had multiple issues:
  • internalCtx is not defined
  • the activation flow was trying to resolve a GitHub source that did not contain the intended skill
  • the UI was showing every recommendation as activating instead of only the clicked skill
  • recommended skills were not persisting an Assigned state cleanly
These were fixed in the plugin layer without core app changes. A ledger of recent skill activity was added to the Skill DNA screen.

3. Performance HQ / company matrix work

A large portion of the thread was spent on Performance HQ. The user wanted a company-wide matrix grouped by role-skill bloom cell and seniority, with telemetry such as tokens in/out, worked-hours proxies, and LOC placeholders. The work evolved through several designs:
  • a company matrix built from existing telemetry
  • a separate page rather than replacing the project-scoped Performance HQ tab
  • then a v2 plugin identity so the matrix could be installed de novo rather than requiring an upgrade-capability approval path
Eventually, a Performance HQ v2 plugin package was created and installed locally as a separate plugin record. The v2 matrix route and backend route both exist, but the live app’s operational data was still empty.

4. Live instance state drift and restoration

At one point the user reported routes suddenly returning blank screens / not found pages. Investigation showed the instance had come up without the original company graph loaded. The Tremor company graph was restored from docs/companies/tremor, which brought back the core company/issue/agent/project graph. The company prefix in the live instance is now TRE, not the older TREAAA prefix that had been used in some earlier routes/screenshots.

5. Root issue now under investigation

After the restore, the live app still showed no operational telemetry. Costs, inbox, runs, and other operational dashboards remained empty. This is the current problem.

Current Live Runtime Facts

Company / route state

  • Current company name: Tremor
  • Current issue prefix: TRE
  • Dashboard route that works: http://localhost:3100/TRE/dashboard
  • Costs route that works: http://localhost:3100/TRE/costs
  • Inbox route that works: http://localhost:3100/TRE/inbox/mine
  • Old TREAAA routes are no longer valid after the restore

Live DB path and credentials

The embedded Postgres instance is at:
  • /Users/sydneymilton/.paperclip/instances/default/db
The repo hardcodes the local embedded DB connection as:
  • postgres://paperclip:paperclip@127.0.0.1:54329/paperclip

Current embedded postgres details

The active embedded postgres command is stored in postmaster.opts under the instance DB directory and is listening on port 54329.

Current company id

  • a5491266-37cd-4d7b-b6cb-77f358fb052b

Live API Evidence

The following API checks were performed against the running app.

Companies

GET /api/companies Result:
  • One active company exists
  • name: Tremor
  • issuePrefix: TRE
  • status: active

Costs / finance

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/costs/summary?from=2026-04-01&to=2026-04-10 Result:
{"companyId":"a5491266-37cd-4d7b-b6cb-77f358fb052b","spendCents":0,"budgetCents":0,"utilizationPercent":0}
GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/costs/finance-events?from=2026-04-01&to=2026-04-10&limit=5 Result:
  • []
GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/sidebar-badges Result:
{"inbox":0,"approvals":0,"failedRuns":0,"joinRequests":0}

Live runs

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/live-runs Result:
  • []

Issues

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/issues?q=TRE- Result:
  • 15 issues returned
  • This confirms that the issue graph is present
  • The loss is specifically in operational telemetry, not the issues table

Live DB Table Counts

I queried the embedded Postgres directly with a temporary script using the repo database package and the embedded DB credentials.

Present / non-empty

  • agents: 15
  • issues: 15
  • projects: 2
  • activity_log: 5
  • agent_runtime_state: 2
  • performance_signals: 2
  • performance_snapshots: 2

Empty or effectively empty

  • cost_events: 0
  • finance_events: 0
  • issue_inbox_archives: 0
  • heartbeat_runs: 0
  • agent_task_sessions: 0
  • agent_wakeup_requests: 0
  • approval_comments: 0
  • approvals: 0
  • budget_incidents: 0
  • heartbeat_run_events: 0
  • issue_comments: 0
  • issue_work_products: 0
  • plugin_job_runs: 0
  • plugin_logs: 0
  • performance_ledger: 0
  • routine_runs: 0
  • workspace_operations: 0

Sample Live Rows

activity_log

The only recent rows are from the restore/reinstall cycle:
  1. company.imported for the company restore
  2. plugin.installed for tremor.company-intake
  3. plugin.installed for tremor.project-flight-plan
  4. plugin.installed for tremor.agent-skills
  5. plugin.installed for tremor.performance-hq-v2

agent_runtime_state

Only two rows exist. Both show zero token/cost totals and session_id is null.

performance_signals / performance_snapshots

  • 2 rows each
  • They are restore-time snapshots with zeroed counts / costs / hours

Backup / Archive Inventory

Backups were checked under:
  • /Users/sydneymilton/.paperclip/instances/default/data/backups/
The SQL backups available there look like schema dumps plus a few seed rows, not full telemetry backups. Observed characteristics:
  • files named like paperclip-20260410-001624.sql
  • files are about 2234 lines
  • grep '^COPY ' returned nothing
  • grep '^INSERT INTO ' only surfaced a handful of seed inserts near the end:
    • instance_settings
    • instance_user_roles
    • plugins
    • user
This means the backups currently found do not contain row-level telemetry history for the missing tables.

Old Workspace / Production Artifacts

An older path exists here:
  • /Users/sydneymilton/paperclip-production/home/instances/default
This contains runtime artifacts, logs, and workspace files from the older instance layout. Useful observations:
  • it contains many TREAAA-prefixed references in logs/workspaces
  • it contains run output and prompt context from earlier sessions
  • it does not contain a DB snapshot with the missing telemetry rows
This makes it useful for reconstructing history or sequence, but not a direct telemetry restore source.

Most Likely Diagnosis

Inference

The active DB was restored or recreated from a snapshot that preserved:
  • schema
  • core company graph
  • agents/issues/projects
  • a few recent restore-time logs
but did not preserve operational telemetry rows. That explains all observed symptoms:
  • zero cost events
  • zero finance events
  • zero heartbeat runs
  • empty inbox and work product views
  • missing activity history beyond the restore window
  • no live run history

Why this is not a UI bug

The API is returning empty arrays and zeros directly from the backing tables. The UI is faithfully rendering what the DB contains.

What Has Been Ruled Out

  • A browser-only rendering issue
  • A plugin bundle cache problem as the primary cause of missing telemetry
  • A crash in the UI server
  • A missing company graph entirely
  • An issue table / agent table loss

What Remains Unresolved

  1. Whether another DB snapshot/archive exists elsewhere on disk that has the full telemetry rows.
  2. Whether the missing data can be partially reconstructed from logs/workspaces.
  3. Which exact restore/reseed step caused the live telemetry tables to be emptied.

Suggested Next Investigation Steps

  1. Search for any additional hidden dumps, WAL archives, or filesystem snapshots outside the checked backup directory.
  2. Inspect old run logs and workspace artifacts for timestamps and commands that may show a restore or reseed operation.
  3. Determine whether the restore was a full refresh or a selective import that intentionally omitted telemetry tables.
  4. If recovery is impossible, document the loss boundary precisely so future restores include the missing tables.

Gemini Pro Investigation Prompt

Use the following prompt verbatim or with minimal edits. It explicitly tells Gemini to read this markdown handoff first and to delegate the investigation to subagents:
You are investigating apparent data loss in a local Paperclip/Tremor instance. Do a read-only forensic deep dive and determine where the missing operational/work data went, whether it is recoverable, and what restore/reseed step likely caused the loss.

First, read this handoff file in full and use it as the primary source of truth:
- /Users/sydneymilton/dev/_sandbox/tremor/local-pc/docs/plans/2026-04-10-operational-data-loss-investigation-context.md

Use subagents explicitly:
- Use `@codebase_investigator` for codebase inspection, DB schema/restore logic, filesystem searches, log correlation, and dependency tracing.
- Use `@browser_agent` for frontend inspection, route verification, and validating what the live UI currently shows.
- If you need synthesis across both threads, use the main agent or `@generalist_agent` after the specialists report back.
- If `browser_agent` is unavailable in your environment, state that clearly and continue with the codebase investigator plus any manual browser checks you can still perform.

Keep the investigation read-only. Do not mutate the database or files unless I explicitly ask later.

Context:
- Local app URL: http://localhost:3100
- Repo/workspace root: /Users/sydneymilton/dev/_sandbox/tremor/local-pc
- Current live company prefix is TRE
- The core company/issue/agent graph exists again, but operational telemetry is missing in the live DB
- Do not modify data unless I explicitly ask later. Read-only investigation first.

What appears missing in the live DB:
- cost_events
- finance_events
- heartbeat_runs
- issue_inbox_archives
- issue_work_products
- plugin_job_runs
- plugin_logs
- routine_runs
- workspace_operations
- approvals
- budget_incidents
- most other operational telemetry/history tables

What still exists:
- agents
- issues
- projects
- some activity_log rows
- a couple performance_signals and performance_snapshots rows

Known evidence:
- The live DB is an embedded Postgres instance under:
  /Users/sydneymilton/.paperclip/instances/default/db
- The DB is reachable locally via the app’s embedded postgres credentials.
- Backups exist under:
  /Users/sydneymilton/.paperclip/instances/default/data/backups/
- The backups I already checked looked like schema dumps plus a few seed rows, not full telemetry backups.
- There are older artifacts under:
  /Users/sydneymilton/paperclip-production/home/instances/default
  including run logs and workspace files with old operational context.

Tasks:
1. Confirm the current live state of the DB and list the missing operational tables with row counts.
2. Search the filesystem for any other backups, snapshots, dumps, WAL archives, exported SQL, or hidden copies that might still contain the missing rows.
3. Search old workspace/run artifacts for clues about:
   - the last time telemetry was present
   - any restore/reseed/reset step
   - any scripts or commands that may have dropped or reseeded the DB
4. Determine whether the missing data is:
   - recoverable from another backup/archive
   - reconstructable from logs/workspaces
   - lost except for partial traces
5. Identify the most likely action that caused the data loss.
6. If recovery is possible, outline the safest restore plan and the exact source of truth to restore from.
7. If recovery is not possible, state that clearly and explain why.

Constraints:
- Be thorough but do not mutate anything.
- Do not run destructive commands.
- Prefer direct evidence over inference.
- If you make an inference, label it as such.
- Cite the exact file paths, table names, commands, or log lines that support each conclusion.

Useful places to inspect:
- /Users/sydneymilton/.paperclip/instances/default/db
- /Users/sydneymilton/.paperclip/instances/default/data/backups/
- /Users/sydneymilton/paperclip-production/home/instances/default
- /Users/sydneymilton/dev/_sandbox/tremor/local-pc

Expected output:
1. Executive summary
2. Evidence table of missing vs present tables
3. Backup/archive inventory
4. Recovery assessment
5. Likely cause of data loss
6. Recommended next steps

Operational Notes

  • The old TREAAA routes are stale after the restore; the current company prefix is TRE.
  • The live dashboard and costs pages load from the restored company state, but operational metrics remain zero because the underlying tables are empty.
  • Performance HQ / matrix work, plugin installs, and other ongoing work happened during the same thread, but do not explain the telemetry loss by themselves.
  • The plugin/page work is separate from the database loss investigation.

Practical Bottom Line

The best current conclusion is:
The operational telemetry tables were not preserved in the active embedded DB restore, and the currently available backups do not contain row-level telemetry data to recover them.
If that changes, the investigation should focus on finding another archive or a previous DB snapshot that includes the missing tables.