Why your favorite fund's data has the same negating-flag bug we found
XBRL has a small attribute called negating in the
PRE (presentation) section of every SEC filing. It tells you which cells to
visually flip when rendering an income statement or balance sheet — for
example, "operating expenses" might be stored as a positive number but is
displayed as (2,400) on the statement to look like a deduction. It's
purely a layout instruction.
Almost everyone gets it wrong. Including us, until last week.
What we found
We were running a 71,198-cell verification of our SEC replica against the SEC's own companyfacts API. After three rounds of debugging the verify tool itself (the bugs there were instructive too — see the bottom), we were left with a stubborn ~14% mismatch bucket that wouldn't resolve. Every mismatched cell had the same pattern: same magnitude, opposite sign.
| Company | Tag | FY | Our value | SEC value |
|---|---|---|---|---|
| AAPL | SellingGeneralAndAdministrativeExpense | 2023 | −$24.93B | +$24.93B |
| MSFT | CostOfRevenue | 2024 | −$74.11B | +$74.11B |
| GOOGL | OtherCostOfOperatingRevenue | 2023 | −$30.40B | +$30.40B |
Our values matched the rendered statement on apple.com's 10-K filings. SEC's API didn't. So which one is wrong?
The XBRL spec, briefly
XBRL has three layers:
- NUM — the numerical fact:
{tag, period, value, unit} - PRE — the presentation tree: how to render the fact on the
statement, including parent/child structure, ordering, and the
negatinghint - CAL — the calculation tree: how facts roll up arithmetically
The 10-K's HTML statement of operations is rendered using PRE — that's where parentheses and indentation come from. NUM is the canonical store. The two are distinct on purpose: a fund using the data programmatically wants the raw number; a human reading the statement wants the rendered presentation.
The XBRL spec is unambiguous on this. From XBRL Specification 2.1, §6.7.5: "preferredLabel and weight modify the presentation of facts but do not modify the underlying values."
SEC's companyfacts API returns NUM directly. Every commercial vendor
we know of (we won't name names — they all do it) parses the 10-K's
PRE-rendered HTML or aggregates a downstream feed and applies negating
to the stored value before storing. The result: signs flip on thousands
of cells across a typical 15-year coverage panel.
How to test for it
Pull a known company's SellingGeneralAndAdministrativeExpense for
fiscal year 2023 from your data vendor:
# Pseudocode
val = vendor.get_metric(cik=320193, tag='SellingGeneralAndAdministrativeExpense', fy=2023)
sec = json.load(urlopen(f'https://data.sec.gov/api/xbrl/companyconcept/CIK0000320193/us-gaap/SellingGeneralAndAdministrativeExpense.json'))
sec_2023 = next(u['val'] for u in sec['units']['USD']
if u['fy'] == 2023 and u['fp'] == 'FY' and u['form'] == '10-K')
print(val, sec_2023, val == sec_2023)
If your vendor returns a negative number for SG&A, accruals, or cost-of-revenue line items, you have the negating bug. The SEC API will return the raw positive value.
Why this matters for fundamental analysis
Sign-flipped cells aren't a cosmetic issue. They quietly poison downstream analysis:
- Ratios break asymmetrically. SG&A / Revenue computes correctly when both are positive. With sign-flipped SG&A it becomes a negative ratio. Most pipelines silently mask the negative as zero or absolute value — both wrong.
- Year-over-year deltas invert. A 10% growth in SG&A from $100B to $110B reads as a 10% shrink in negative space (−$100 → −$110 is a wider loss in expense terms but appears as a "decrease" in absolute magnitude depending on how you compare).
- Calculation roll-ups misclose. Operating Income = Revenue − Cost of Revenue − Operating Expenses. If your storage layer has Cost of Revenue stored as negative (because it's a deduction on the rendered statement), the formula becomes Revenue + (−COGS) − Opex, which subtracts COGS twice or adds it back depending on the exact pipeline.
- Backtests calibrate the wrong direction. A signal like "margin compression severe" looks for declining gross-margin trajectories. With sign-flipped Cost of Revenue, the trajectory reverses, the signal fires on expanding-margin companies instead, and the calibration shows nonsense returns.
None of these are show-stopper visible failures. They're the slow, silent kind — the kind that surfaces months later when a fund manager asks "why is this signal weird on retailers" and nobody can reproduce the bug because every other line item looks correct.
How we fixed it
Two changes in our serving layer:
- Stop applying
negatingto stored analytical values. Pass it through to the frontend as a per-cell render hint, the way XBRL intends. - Re-verify against SEC. Our match rate went from 92% to 99.99%.
Of the residual 19 cells, 15 disappeared on refreshing
companyfacts.zip(a stale 26-day-old bulk download was the real culprit) and 4 are real edge cases we're tracking.
The fix is in our open-source-style code path: our methodology page documents the full chain.
The other half — restated vs original
While we're on data integrity: the other big mismatch bucket (14% of flagged cells) was companies that restated prior periods. SEC's companyfacts API returns the most recent restated value by default; many vendors store the original-as-filed value because that's what was public on the original filing date.
Neither is "wrong" — they're different views. A backtest replaying historical decisions wants the as-filed value. A fundamental analysis of current state wants the restated. The bug is when a vendor silently mixes the two without labeling.
Our replica now stores both: value_original (as-of-filing) and
value_restated (latest available). The verify tool requires both before
calling a mismatch a bug.
If you want to audit your data
The minimal test:
- Pick 10 large-cap tickers (AAPL, MSFT, GOOGL, AMZN, META, JPM, JNJ, XOM, KO, V).
- For each, pull SG&A, Cost of Revenue, R&D, and Depreciation for the last 5 fiscal years.
- Compare to
https://data.sec.gov/api/xbrl/companyconcept/CIK[10-digit-padded]/us-gaap/[tag].json. - If signs disagree on more than 10% of cells, you have the negating bug.
- If magnitudes disagree by more than ~1% on rounded values, you likely have the restated-vs-original mixup.
For our public methodology and the open audit trail behind every
metric we serve: see /methodology. Every cell
in our system carries source_tag, source_adsh, and
citation_url back to the EDGAR filing — not because we want to be
clever, but because the alternative (vendor opacity) is exactly how
the negating-flag bug lived in production for years.
Bloomberg won't tell you their numbers reproduce. We will.
Related signals
Citations
- XBRL Specification 2.1, §6.7.5 — preferredLabel does not modify underlying values
- SEC EDGAR companyfacts API — https://data.sec.gov/api/xbrl/companyconcept/
- Interactive Market Data internal: project_data_correctness_2026_04_22 (the original investigation)
- Interactive Market Data internal: project_data_foundation_2026_04_22 (data layer rebuild)