Why a new file format

Better AI on Excel can't fix what's structurally wrong with the file. So we built a new one.

6 min readDeepCell Team

The question we get most often is the fair one: why does .deepcell need to exist at all? Couldn't you just put a smarter agent on top of Excel and call it a day?

We tried. For about a year, that was the plan.

The plan didn't survive contact with two agents working on the same model.

The file is the constraint, not the model on top of it#

A spreadsheet is often described as two files smashed together — a formula graph and a snapshot of values. That's true, but it understates the problem. There's a third thing in the workbook that nobody talks about, because it isn't in the file at all: the author's working memory. Why this discount rate. Why the growth curve bends in 2027. Which scenario the analyst tried first and threw away. What source the number came from. What the number means.

When one human owns a model end to end, the working memory lives in their head and that's fine. They open the workbook, the values evaluate, and the why is recovered from the only place it was ever stored.

The moment a second reader touches the file, the working memory is gone. The values are still there. The formulas are still there. The reasoning has evaporated.

This is tolerable in a single-author world. It is not tolerable in a world where an analyst hands a model to an agent, the agent extends it, hands it back, and another agent picks it up next week to run a sensitivity. Every handoff is a memory wipe.

Agents can produce a clean xlsx. They cannot read each other's#

This is the part that surprised us. Producing a workbook from scratch is a solved problem — give a capable model a prompt and it will write you a passable three-statement model in xlsx. It will look right. It will tie out.

Hand that same xlsx to a second agent and ask it to add an LBO scenario.

What the second agent sees is a grid of strings. The cell is a string. The formula is a string. There's a merged region in row 14 that means "header," except in the cash flow sheet where the same visual pattern means "subtotal." There's a named range called Revenue that points at one column in the model tab and a different column in the comps tab. The conventions the first agent used — spacing, color, where assumptions live versus where outputs live — are invisible to the second agent because they were never written down. They were style.

Every Excel model accretes ad-hoc conventions, and those conventions don't survive the handoff. Not between humans, not between agents, and especially not between a human and an agent.

The constraint isn't AI quality. It's the file.

What we needed the file to carry#

Once we accepted that, the question stopped being "how do we make the agent better at Excel?" and started being "what would a file look like if it were designed for two readers instead of one?"

The answer turned out to be a small set of explicit primitives. The anatomy post walks through them in detail; the short version is that a .deepcell separates four things a workbook conflates:

  • ItemDefs — the line items, with hierarchy and type constraints.
  • ContextDefs — the dimensions every value lives in: Item × Time × Scenario × Status.
  • CalcDefs — formulas with explicit dependencies, evaluated as a DAG. NPV, IRR, SUMIF, IF, and 30+ others, with circular-dependency handling that doesn't require a checkbox in Options.
  • Reasoning — a typed graph of Claims, Assumptions, Evidence, and Arguments. The working memory, written down, in the file, where the next reader can find it.

A value isn't a string in a cell. It's a fact pinned to a coordinate in the four-dimensional space the model defines. A formula isn't a string either — it's a node in a dependency graph that the engine can trace forward (what depends on this?) or backward (what does this depend on?) without parsing text.

That last point is what makes multi-agent collaboration possible. The second agent doesn't need to reverse-engineer the first agent's conventions. The conventions are the schema.

A concrete contrast#

Two analysts on the same model, the Excel way:

analyst_a/Q3_model_v7.xlsx
analyst_b/Q3_model_v7_LBO_scenario.xlsx
→ merge: rebuild from scratch, hope nothing dropped

You know how this ends. Someone sends a Slack message with three screenshots. Someone else opens both files side by side and types numbers from one into the other. The merge is a manual reconstruction, and the audit trail is the Slack thread.

Two analysts on the same model, the DeepCell way:

deepcell variant create lbo-scenario
# ... edits happen on the variant ...
deepcell diff lbo-scenario main
deepcell merge lbo-scenario --into main

The variant is a first-class branch of the model. The diff is semantic — it tells you which CalcDefs changed, which Values were overridden in which Scenario, which Assumptions were added or revised. The merge is dependency-aware: if the LBO scenario changed the cost-of-debt assumption and main changed the revenue formula that ultimately depends on it, the engine recomputes and surfaces the conflict per cell, not per workbook.

The same primitives that let one agent hand a model to another are the ones that let two humans actually collaborate on it. Versioning is git-native — log, diff, restore, commit, merge, variant — because the problem of "two people changed the same thing" is older than spreadsheets and has had a good answer for twenty years.

Coexistence, not replacement#

None of this is an argument against Excel. Excel is the best tool ever built for a single human doing quick analysis and sending the result to someone who will never install anything. That use case is not going away, and we don't want it to.

What we wanted was a format for models that live more than 24 hours — the ones that get passed around, extended, audited, revised, and quoted from in a memo six months later. For that, the file has to carry more than a grid.

We round-trip cleanly to and from xlsx — the round-tripping post covers the mechanics — because the stakeholder on the other end of the email still wants a workbook. The .deepcell is the source of truth; the xlsx is a view of it. The agent reads and writes the source. The human reads whichever view is most useful.

That's the trade. A new format, because the old one was designed for a single reader, and the world is no longer single-reader.


See it for yourself — open a sample .deepcell in the playground. Edit a value, watch the dependents recalculate, inspect the reasoning behind any number.