Show your work

What AI-traceable reasoning looks like inside a .deepcell.

5 min readDeepCell Team

"Can I trust an agent's number?"

It's the question every analyst eventually asks, usually right before a deliverable goes out. The cell says 78.4%. The agent that produced it is gone — context window flushed, chat archived, prompt forgotten. The model is a snapshot. The reasoning evaporated when the tab closed.

A confident number with no audit trail is worse than no number at all. It leaks trust the moment someone in the meeting asks why.

The diagnostic#

ChatGPT-in-a-spreadsheet is a fluent stranger sitting next to you. It will fill any cell you point at. What it won't do — what no chat-shaped tool can do — is leave behind a structured record of how it got there. A cell comment is a paragraph. A paragraph is not a graph. You can't query it. You can't diff it. You can't ask which assumptions, if they broke, would force the number to move.

The same problem exists without agents. An analyst leaves the team and their model becomes archaeology. The formulas survive; the why doesn't. We've all inherited a workbook whose tabs are named final_v3_USE_THIS and spent a Tuesday reverse-engineering somebody's worldview from cell references.

The fix isn't a better comment system. It's giving reasoning the same first-class treatment we give formulas.

The design move#

A .deepcell carries a typed reasoning graph alongside the values. Anatomy of a .deepcell covers the full section list; the one we care about here is Reasoning. Four node types:

  • Claim — a load-bearing statement about the model. Seven kinds: thesis, risk, catalyst, counter, question, market_consensus, knowledge.
  • Assumption — an input you're choosing to believe. Lifecycle: holding, uncertain, broken, superseded.
  • Evidence — an anchor to something outside the model. A sourceUri (filing:, url:, doc:, deepcell:), an optional effectiveDate, an excerpt.
  • Argument — a typed edge between any of the above. Eight relations: supports, refutes, depends_on, derives_from, variant_of, supersedes, contradicts, references.

Claims link to the cells they justify — by itemRefs, contextRefs, and statusRef. The graph isn't decoration. It's the part of the file that explains the rest of the file.

Here's what a thesis looks like in the wild, slightly fictionalized from a real working model:

<Claim id="t_main_v2" kind="thesis" status="active" strength="high"
       itemRefs="Gross_Margin_Pct" contextRefs="Q1_2027,Q2_2027" statusRef="projected">
  <Label>2027 GM expansion thesis (v2)</Label>
  <Body>Margin reaches 78% by mid-2027 — Q3 call confirmed steeper
    inference cost curve than expected.</Body>
</Claim>
 
<Assumption id="a_inf_cost" status="holding">
  <Body>Per-token inference cost falls 30%/yr.</Body>
</Assumption>
 
<Argument from="t_main_v2" rel="depends_on" to="a_inf_cost"/>

Three nodes, one edge. The thesis points at two projected quarters of Gross_Margin_Pct. It depends on an assumption about inference cost decay. If that assumption breaks, the thesis is the first thing that should be re-examined.

That's the contract: every load-bearing number in the model has a path back to the assumptions that produced it.

What you can do with it#

The graph is queryable from the CLI:

deepcell assumption impact a_inf_cost
# → lists every Claim that depends on this Assumption, transitively

Useful when a quarterly print lands and you want to know, before lunch, which parts of your model the new data point touches.

The more interesting command is the one that runs at commit time:

deepcell reasoning-diff model.deepcell
# → after a working-tree edit, shows Claims now flagged as drift candidates

Flip a_inf_cost from holding to broken — say a vendor announces a price floor — and reasoning-diff walks the graph and surfaces every Claim downstream of that assumption as a drift candidate. Not automatically falsified. Flagged. The analyst still decides whether the thesis survives the new reality or needs a supersedes edge to a v3.

Wire that into a git pre-commit hook and your model can't quietly drift out from under its own thesis without somebody noticing.

History is also queryable:

deepcell claim history t_main_v2
# → every version of this Claim, with the Arguments that superseded it

The graph survives an analyst leaving. The next person to open the file gets the worldview, not just the worksheet.

Coexistence, not replacement#

An xlsx can carry a cell comment. It can't carry a graph. That's not a flaw in Excel — comments were designed for the casual reader, the colleague flipping through the tab. They do that job well.

The reasoning graph is for the audit trail. The two can live side-by-side: comments stay in the workbook for the casual reader, the graph stays in the .deepcell for the auditor, the successor analyst, and the agent running pre-commit checks. Bring your Excel model in, layer reasoning on top, export back out when you need to.

One aside#

The Reasoning section is the most recent addition to the format — the spec went in last sprint and is the part of .deepcell most actively evolving. The node kinds and relations above are stable; expect more queries to land on top of them.

Who signs the work#

A reasoning graph is not absolution. The agent can populate Claims and Evidence at scale, but a Claim with strength="high" is still a claim a human is making. The graph makes the claim legible — to a reviewer, to a regulator, to the analyst who inherits the model in eighteen months. It doesn't make the claim correct. See the analyst and Claude for how we think about that division of labor: the agent drafts, the analyst signs.

The number on the page is still yours. Now there's a paper trail behind it.


See it for yourself — open a sample .deepcell in the playground. Edit a value, watch the dependents recalculate, inspect the reasoning behind any number.