How a phantom signal almost shipped — and how the validation killed it
In late April 2026 we were ready to ship a new top-line signal: a triple-stack of capex_spike + positive_eps_streak + zombie_alert. The calibration looked elite — n=72 events, 20-day mean −5.89%, hit rate 5.6%, ret-per-stdev −1.72. By single-signal standards that's institutional-grade short alpha. The page draft was written. The Tweet thread was queued.
Then we ran the validation gate, and it killed the signal.
The validation gate
Every published Interactive Market Data signal must pass a structural validation: the n events should distribute across multiple tickers, periods, and sectors. The check is two lines of SQL:
SELECT cik, COUNT(*) AS n_events
FROM signal_events
WHERE signal_set = 'capex_spike,positive_eps_streak,zombie_alert'
GROUP BY cik ORDER BY n_events DESC LIMIT 5;
The result was unambiguous: 64 of the 72 events were the same company — Liberty Broadband Class A (LBRDA) re-emitting the same combined signal year after year, fiscal period after fiscal period, share class after share class. The "diverse" n=72 cohort was effectively n=8.
| cik | name | events in claimed cohort |
|---|---|---|
| 1611983 | Liberty Broadband Corp (LBRDA) | 64 |
| various | (8 other tickers) | 1 each |
One ticker generating 88% of the events meant the −5.89% return was just LBRDA's price action over the sample window, not a generalizable pattern. We were minutes away from publishing a signal whose entire alpha came from one share-class quirk in one company.
The fix
Two changes:
- Dedupe by (cik, fy, signal_set) at event-construction time. A single fiscal year fires each signal at most once per company; multiple share classes (LBRDA-A, LBRDA-B, LBRDC) each have their own CIK derivative entries that previously slipped through the join.
- Add a per-cohort concentration cap: any single ticker contributing more than 30% of the events flags the signal as suspicious and blocks publication.
After the rebuild, the strongest validated triple at 20 days is capex_spike + fcf_ni_divergence + fcf_turn_negative: n=114, 20d mean −3.78%, ret-per-stdev −0.39, validated multi-period and multi-sector.
| Phantom signal (pre-fix) | Real top triple (post-fix) | |
|---|---|---|
| Triple | capex_spike + positive_eps_streak + zombie_alert | capex_spike + fcf_ni_divergence + fcf_turn_negative |
| n events | 72 (claimed) | 114 (validated) |
| n unique tickers | 9 (effectively 1) | 89 |
| 20d mean return | −5.89% | −3.78% |
| ret/σ | −1.72 | −0.39 |
| Pass validation | NO | YES |
The validated number is less spectacular but it's real. A fund running a $100M short book against the post-fix signal won't see their backtest collapse on first contact with reality.
The deeper finding — the data wanted to tell us something else
While re-running the calibration with the dedupe fix, we extended the analysis to longer horizons (60d and 252d) and added net-of-execution-cost columns (5 bps half-spread + impact + ADV-tiered borrow). The results across all 247 validated triples were consistent and surprising:
Multi-stress signal stacks calibrate as deep-value LONGS at 252-day horizons, not shorts.
The strongest 252d net-of-costs triple is capex_spike + dso_drift_severe + margin_compression_severe as a LONG: +70.32% mean PnL net of costs over 252 days, n=51, ret/σ +0.20.
The next four are all longs:
| Triple | Direction | n | 252d net PnL |
|---|---|---|---|
| accruals_quality_low + beneish_m_score_high + fcf_turn_negative | long | 90 | +37.09% |
| fcf_turn_negative + margin_compression_severe + profit_to_loss | long | 97 | +33.45% |
| beneish_m_score_high + capex_spike + dso_drift_severe | long | 39 | +33.33% |
| capex_spike + dso_drift_severe + profit_to_loss | long | 36 | +33.00% |
Every one of these triples is a stack of forensic short signals — Beneish manipulator score, accruals quality red flags, capex blowouts, margin compression, profit-to-loss inflection. Individually, the academic literature treats them as bearish flags. Stacked, the signed PnL says: buy the stocks.
Why this happens — the survivor-bias lens
The mechanism is straightforward once you stare at it long enough.
The companies that fire 3+ stress signals in the same fiscal year are deeply impaired. Some of them go bankrupt. Some get delisted. Some get acquired at a discount. Those companies disappear from the prices_daily.parquet panel — the price series ends, and our forward-return calculation on those names returns NULL. They contribute zero to the average.
What's left is the survivors: companies that fired all the bad signals, then somehow didn't die. Maybe management changed. Maybe a strategic buyer materialized. Maybe demand returned. Whatever the path, the survivors are by definition the ones whose stock came back. Their forward 252-day return is the only signal that lives long enough to be measured.
This is exactly the failure mode that academic literature on accruals and Beneish flags warns about: shorting bad-fundamentals stocks works when you have a delisting-aware return panel that includes the zeros from the bankruptcies. We don't have that panel — yfinance's split-adjusted close ends when the ticker stops trading, with no marker that the company actually went to zero. So our dataset is inadvertently reporting only the bouncebacks, and the math says the bouncebacks are huge.
This is why beneish_m_score_high as a 252-day short loses 48% net in our backtest while the original Beneish papers report it as a real short over multi-year holds. Same signal, different return panel, opposite sign.
What this changes
Three things, immediately:
- The marketing pages stop quoting "X% / 20d short" as the headline number for any forensic signal. The 20d horizon is the only one where some shorts work (capex_spike +2.05% net), and that's a tactical claim, not a thesis. The tradeable thesis at multi-month horizons is the long bounce.
- The calibration pages now show the gross-and-net columns side by side at all five horizons. Funds reading our site see exactly what we see: where the alpha is, what the haircut is, where the signal flips sign by horizon.
- The product roadmap pivots toward the multi-signal combo view. Single-signal calibration is a starting block; the actual edge is in the stacks, and the stacks tell a different story than the individual flags. The new
pyflo_signal_combo(cik, fy)MCP tool surfaces the full signal-set + calibrated outcome for any company by 10-K filing — that's the primitive that quant teams actually want.
The methodology lesson
The validation gate that killed the phantom signal wasn't a bolt-on safety check. It's the product. Anyone running fundamental analysis at scale has the same problem we did: data joins go wrong, share classes duplicate, sample sizes inflate, and the resulting backtest looks great until someone tries to trade it.
What's different is whether you ship the validation framework as a visible part of the analysis. Bloomberg shows you a PEG ratio without telling you whether the underlying earnings number reconciles to the 10-K. FactSet returns a value without telling you what assumptions went into the screen. There is no audit trail.
Interactive Market Data's audit trail starts with the methodology pages: /methodology documents every signal — definition, trigger logic, source-file reference, academic citations, full calibration grid (gross and net of costs), sector breakdown, and known caveats. Including the painful caveats. The Beneish page literally says: "Short loses at every horizon net of costs (worst: −48.06% net at 252d). Use as watchlist flag, not tradeable signal."
That's the kind of statement Bloomberg will never write next to its data. We can only write it because the LBRDA duplication taught us that not writing it has a cost.
Try the data yourself
The full validated triples leaderboard is at /api/calibration/triples. The combo prediction for any company by CIK is at /methodology (see signal cross-links). The MCP server at https://interactivemarketdata.com/mcp/sse exposes pyflo_signal_combo, pyflo_signal_methodology, and 22 other tools so an LLM agent can ground answers in this data with citations back to EDGAR.
The phantom signal isn't shipping. The validation framework is.
Related signals
capex_spikefcf_ni_divergencefcf_turn_negativebeneish_m_score_highdso_drift_severemargin_compression_severeaccruals_quality_low
Citations
- Beneish, M.D. (1999). 'The Detection of Earnings Manipulation.' Financial Analysts Journal.
- Sloan, R.G. (1996). 'Do stock prices fully reflect information in accruals and cash flows about future earnings?' The Accounting Review.
- Cooper, M.J., Gulen, H., & Schill, M.J. (2008). 'Asset Growth and the Cross-Section of Stock Returns.' Journal of Finance.
- Interactive Market Data internal: project_calibration_v2_2026_04_27 (validation framework design).
- Interactive Market Data internal: project_extensions_2026_04_27 (LBRDA duplication discovery).
- Interactive Market Data internal: project_signal_combo_2026_04_29 (multi-horizon multi-signal calibration).