Operational Data Loss Investigation Context

Status: Active investigation handoff
Owner: Codex / Paperclip local instance
Date: 2026-04-10
Target model: Gemini Pro
Scope: Read-only forensic investigation of missing operational/work telemetry in the live Tremor/Paperclip instance

Executive Summary

The live Paperclip instance is not crashed. The core company graph, issues, agents, and projects exist again, but the operational telemetry layer is empty in the active DB. Confirmed missing/empty in the live DB:

cost_events
finance_events
heartbeat_runs
issue_inbox_archives
issue_work_products
plugin_job_runs
plugin_logs
routine_runs
workspace_operations
approvals
budget_incidents
most other operational history tables

Confirmed present:

agents
issues
projects
a small number of activity_log rows
a couple of performance_signals rows
a couple of performance_snapshots rows

The visible user-facing effect is that the app shows zero cost, zero run history, empty inbox, and sparse operational dashboards. The issue is a data-loss / restore-scope problem, not a UI rendering bug.

What Happened in This Chat Thread

This is the condensed history of the work that led to the current state.

1. Initial focus: removing Google Antigravity from the Mac

The conversation began with a request to completely remove Google Antigravity from macOS. A script was created and hardened for deep cleanup, then run and verified. Additional historical traces were removed from shell profiles, keychain, and crash metadata.

2. Agent Skills / `internalCtx` / skill activation fixes

The next topic was the Agent Skills plugin. The activation path had multiple issues:

internalCtx is not defined
the activation flow was trying to resolve a GitHub source that did not contain the intended skill
the UI was showing every recommendation as activating instead of only the clicked skill
recommended skills were not persisting an Assigned state cleanly

These were fixed in the plugin layer without core app changes. A ledger of recent skill activity was added to the Skill DNA screen.

3. Performance HQ / company matrix work

A large portion of the thread was spent on Performance HQ. The user wanted a company-wide matrix grouped by role-skill bloom cell and seniority, with telemetry such as tokens in/out, worked-hours proxies, and LOC placeholders. The work evolved through several designs:

a company matrix built from existing telemetry
a separate page rather than replacing the project-scoped Performance HQ tab
then a v2 plugin identity so the matrix could be installed de novo rather than requiring an upgrade-capability approval path

Eventually, a Performance HQ v2 plugin package was created and installed locally as a separate plugin record. The v2 matrix route and backend route both exist, but the live app’s operational data was still empty.

4. Live instance state drift and restoration

At one point the user reported routes suddenly returning blank screens / not found pages. Investigation showed the instance had come up without the original company graph loaded. The Tremor company graph was restored from docs/companies/tremor, which brought back the core company/issue/agent/project graph. The company prefix in the live instance is now TRE, not the older TREAAA prefix that had been used in some earlier routes/screenshots.

5. Root issue now under investigation

After the restore, the live app still showed no operational telemetry. Costs, inbox, runs, and other operational dashboards remained empty. This is the current problem.

Current Live Runtime Facts

Company / route state

Current company name: Tremor
Current issue prefix: TRE
Dashboard route that works: http://localhost:3100/TRE/dashboard
Costs route that works: http://localhost:3100/TRE/costs
Inbox route that works: http://localhost:3100/TRE/inbox/mine
Old TREAAA routes are no longer valid after the restore

Live DB path and credentials

The embedded Postgres instance is at:

/Users/sydneymilton/.paperclip/instances/default/db

The repo hardcodes the local embedded DB connection as:

postgres://paperclip:paperclip@127.0.0.1:54329/paperclip

Current embedded postgres details

The active embedded postgres command is stored in postmaster.opts under the instance DB directory and is listening on port 54329.

Current company id

a5491266-37cd-4d7b-b6cb-77f358fb052b

Live API Evidence

The following API checks were performed against the running app.

Companies

GET /api/companies Result:

One active company exists
name: Tremor
issuePrefix: TRE
status: active

Costs / finance

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/costs/summary?from=2026-04-01&to=2026-04-10 Result:

{"companyId":"a5491266-37cd-4d7b-b6cb-77f358fb052b","spendCents":0,"budgetCents":0,"utilizationPercent":0}

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/costs/finance-events?from=2026-04-01&to=2026-04-10&limit=5 Result:

[]

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/sidebar-badges Result:

{"inbox":0,"approvals":0,"failedRuns":0,"joinRequests":0}

Live runs

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/live-runs Result:

[]

Issues

GET /api/companies/a5491266-37cd-4d7b-b6cb-77f358fb052b/issues?q=TRE- Result:

15 issues returned
This confirms that the issue graph is present
The loss is specifically in operational telemetry, not the issues table

Live DB Table Counts

I queried the embedded Postgres directly with a temporary script using the repo database package and the embedded DB credentials.

Present / non-empty

agents: 15
issues: 15
projects: 2
activity_log: 5
agent_runtime_state: 2
performance_signals: 2
performance_snapshots: 2

Empty or effectively empty

cost_events: 0
finance_events: 0
issue_inbox_archives: 0
heartbeat_runs: 0
agent_task_sessions: 0
agent_wakeup_requests: 0
approval_comments: 0
approvals: 0
budget_incidents: 0
heartbeat_run_events: 0
issue_comments: 0
issue_work_products: 0
plugin_job_runs: 0
plugin_logs: 0
performance_ledger: 0
routine_runs: 0
workspace_operations: 0

Sample Live Rows

`activity_log`

The only recent rows are from the restore/reinstall cycle:

company.imported for the company restore
plugin.installed for tremor.company-intake
plugin.installed for tremor.project-flight-plan
plugin.installed for tremor.agent-skills
plugin.installed for tremor.performance-hq-v2

`agent_runtime_state`

Only two rows exist. Both show zero token/cost totals and session_id is null.

`performance_signals` / `performance_snapshots`

2 rows each
They are restore-time snapshots with zeroed counts / costs / hours

Backup / Archive Inventory

Backups were checked under:

/Users/sydneymilton/.paperclip/instances/default/data/backups/

The SQL backups available there look like schema dumps plus a few seed rows, not full telemetry backups. Observed characteristics:

files named like paperclip-20260410-001624.sql
files are about 2234 lines
grep '^COPY ' returned nothing
grep '^INSERT INTO ' only surfaced a handful of seed inserts near the end:
- instance_settings
- instance_user_roles
- plugins
- user

This means the backups currently found do not contain row-level telemetry history for the missing tables.

Old Workspace / Production Artifacts

An older path exists here:

/Users/sydneymilton/paperclip-production/home/instances/default

This contains runtime artifacts, logs, and workspace files from the older instance layout. Useful observations:

it contains many TREAAA-prefixed references in logs/workspaces
it contains run output and prompt context from earlier sessions
it does not contain a DB snapshot with the missing telemetry rows

This makes it useful for reconstructing history or sequence, but not a direct telemetry restore source.

Most Likely Diagnosis

Inference

The active DB was restored or recreated from a snapshot that preserved:

schema
core company graph
agents/issues/projects
a few recent restore-time logs

but did not preserve operational telemetry rows. That explains all observed symptoms:

zero cost events
zero finance events
zero heartbeat runs
empty inbox and work product views
missing activity history beyond the restore window
no live run history

Why this is not a UI bug

The API is returning empty arrays and zeros directly from the backing tables. The UI is faithfully rendering what the DB contains.

What Has Been Ruled Out

A browser-only rendering issue
A plugin bundle cache problem as the primary cause of missing telemetry
A crash in the UI server
A missing company graph entirely
An issue table / agent table loss

What Remains Unresolved

Whether another DB snapshot/archive exists elsewhere on disk that has the full telemetry rows.
Whether the missing data can be partially reconstructed from logs/workspaces.
Which exact restore/reseed step caused the live telemetry tables to be emptied.

Suggested Next Investigation Steps

Search for any additional hidden dumps, WAL archives, or filesystem snapshots outside the checked backup directory.
Inspect old run logs and workspace artifacts for timestamps and commands that may show a restore or reseed operation.
Determine whether the restore was a full refresh or a selective import that intentionally omitted telemetry tables.
If recovery is impossible, document the loss boundary precisely so future restores include the missing tables.

Gemini Pro Investigation Prompt

Use the following prompt verbatim or with minimal edits. It explicitly tells Gemini to read this markdown handoff first and to delegate the investigation to subagents:

You are investigating apparent data loss in a local Paperclip/Tremor instance. Do a read-only forensic deep dive and determine where the missing operational/work data went, whether it is recoverable, and what restore/reseed step likely caused the loss.

First, read this handoff file in full and use it as the primary source of truth:
- /Users/sydneymilton/dev/_sandbox/tremor/local-pc/docs/plans/2026-04-10-operational-data-loss-investigation-context.md

Use subagents explicitly:
- Use `@codebase_investigator` for codebase inspection, DB schema/restore logic, filesystem searches, log correlation, and dependency tracing.
- Use `@browser_agent` for frontend inspection, route verification, and validating what the live UI currently shows.
- If you need synthesis across both threads, use the main agent or `@generalist_agent` after the specialists report back.
- If `browser_agent` is unavailable in your environment, state that clearly and continue with the codebase investigator plus any manual browser checks you can still perform.

Keep the investigation read-only. Do not mutate the database or files unless I explicitly ask later.

Context:
- Local app URL: http://localhost:3100
- Repo/workspace root: /Users/sydneymilton/dev/_sandbox/tremor/local-pc
- Current live company prefix is TRE
- The core company/issue/agent graph exists again, but operational telemetry is missing in the live DB
- Do not modify data unless I explicitly ask later. Read-only investigation first.

What appears missing in the live DB:
- cost_events
- finance_events
- heartbeat_runs
- issue_inbox_archives
- issue_work_products
- plugin_job_runs
- plugin_logs
- routine_runs
- workspace_operations
- approvals
- budget_incidents
- most other operational telemetry/history tables

What still exists:
- agents
- issues
- projects
- some activity_log rows
- a couple performance_signals and performance_snapshots rows

Known evidence:
- The live DB is an embedded Postgres instance under:
  /Users/sydneymilton/.paperclip/instances/default/db
- The DB is reachable locally via the app’s embedded postgres credentials.
- Backups exist under:
  /Users/sydneymilton/.paperclip/instances/default/data/backups/
- The backups I already checked looked like schema dumps plus a few seed rows, not full telemetry backups.
- There are older artifacts under:
  /Users/sydneymilton/paperclip-production/home/instances/default
  including run logs and workspace files with old operational context.

Tasks:
1. Confirm the current live state of the DB and list the missing operational tables with row counts.
2. Search the filesystem for any other backups, snapshots, dumps, WAL archives, exported SQL, or hidden copies that might still contain the missing rows.
3. Search old workspace/run artifacts for clues about:
   - the last time telemetry was present
   - any restore/reseed/reset step
   - any scripts or commands that may have dropped or reseeded the DB
4. Determine whether the missing data is:
   - recoverable from another backup/archive
   - reconstructable from logs/workspaces
   - lost except for partial traces
5. Identify the most likely action that caused the data loss.
6. If recovery is possible, outline the safest restore plan and the exact source of truth to restore from.
7. If recovery is not possible, state that clearly and explain why.

Constraints:
- Be thorough but do not mutate anything.
- Do not run destructive commands.
- Prefer direct evidence over inference.
- If you make an inference, label it as such.
- Cite the exact file paths, table names, commands, or log lines that support each conclusion.

Useful places to inspect:
- /Users/sydneymilton/.paperclip/instances/default/db
- /Users/sydneymilton/.paperclip/instances/default/data/backups/
- /Users/sydneymilton/paperclip-production/home/instances/default
- /Users/sydneymilton/dev/_sandbox/tremor/local-pc

Expected output:
1. Executive summary
2. Evidence table of missing vs present tables
3. Backup/archive inventory
4. Recovery assessment
5. Likely cause of data loss
6. Recommended next steps

Operational Notes

The old TREAAA routes are stale after the restore; the current company prefix is TRE.
The live dashboard and costs pages load from the restored company state, but operational metrics remain zero because the underlying tables are empty.
Performance HQ / matrix work, plugin installs, and other ongoing work happened during the same thread, but do not explain the telemetry loss by themselves.
The plugin/page work is separate from the database loss investigation.

Practical Bottom Line

The best current conclusion is:

The operational telemetry tables were not preserved in the active embedded DB restore, and the currently available backups do not contain row-level telemetry data to recover them.

If that changes, the investigation should focus on finding another archive or a previous DB snapshot that includes the missing tables.

Overview

Specifications

Plans & Investigations

​Operational Data Loss Investigation Context

​Executive Summary

​What Happened in This Chat Thread

​1. Initial focus: removing Google Antigravity from the Mac

​2. Agent Skills / internalCtx / skill activation fixes

​3. Performance HQ / company matrix work

​4. Live instance state drift and restoration

​5. Root issue now under investigation

​Current Live Runtime Facts

​Company / route state

​Live DB path and credentials

​Current embedded postgres details

​Current company id

​Live API Evidence

​Companies

​Costs / finance

​Sidebar badges

​Live runs

​Issues

​Live DB Table Counts

​Present / non-empty

​Empty or effectively empty

​Sample Live Rows

​activity_log

​agent_runtime_state

​performance_signals / performance_snapshots

​Backup / Archive Inventory

​Old Workspace / Production Artifacts

​Most Likely Diagnosis

​Inference

​Why this is not a UI bug

​What Has Been Ruled Out

​What Remains Unresolved

​Suggested Next Investigation Steps

​Gemini Pro Investigation Prompt

​Operational Notes

​Practical Bottom Line