How retail-calendar fiscal years break every off-the-shelf data pipeline
A retail-calendar fiscal year ends on the closest Saturday to January 31, not December 31. Costco's FY2024 ended 2024-09-01 (a 52-week year). Best Buy's FY2024 ended 2024-02-03. Masonite's FY2016 ended 2017-01-01. The economic activity in each of those years is what management reports as that fiscal year's revenue, margins, and so on — that's the period the company runs its business on, and that's the period any meaningful financial analysis must align to.
The bug we found: SEC's companyfacts API returns fiscal-year-end
dates correctly, but the fy integer field on each entry
is the filing's fiscal year, not the value's period. So when
Masonite's FY2018 10-K includes a comparative for FY2016 (now expressed
as 2-years-ago context), the SEC API tags that comparative with
fy=2018, not fy=2016. Most data vendors join
on fy, then silently mis-align the value.
What this looks like in practice
Suppose you're building a screen for "companies with revenue growth
> 15% in FY2016." Your pipeline pulls Masonite's FY2016 entry from
the SEC API. The API returns multiple rows tagged
fy=2016:
- From Masonite's FY2016 10-K (the original filing): a value of $1,840M with end-date 2017-01-01.
- From Masonite's FY2017 10-K (where 2016 appears as the comparative
column): a value of $1,840M with end-date 2017-01-01,
fy=2017. - From Masonite's FY2018 10-K (where 2016 appears as the
2-years-ago column): a value of $1,840M with end-date 2017-01-01,
fy=2018.
The value is the same in all three. The end-date is the same. But
the fy field differs because it's the filing's fiscal
year, not the value's period. A naive join on fy picks
up only ONE of those entries — usually the latest one, because that's
how dictionaries work in Python — and treats Masonite's FY2018-tagged
row as if it represented FY2018 economic activity.
Now you compute Masonite's FY2018 revenue growth. The cohort
defines FY2018 by fy=2018. The "FY2018" value you pulled
is actually FY2016 numbers. Your growth calculation is wrong by two
years of compounding.
Multiply this across ~50 retail-calendar filers in the SEC universe and you have a systematic error that doesn't trigger any obvious "this number looks weird" alarm — every individual cell is a plausible number, just for the wrong period.
The fix is five lines
Don't trust fy. Compute fiscal year from the period's
end-date:
def _effective_fy(end_date):
# If end-date is in the first 7 days of January,
# the period belongs to the prior fiscal year.
if end_date.month == 1 and end_date.day <= 7:
return end_date.year - 1
return end_date.year
Five lines. We caught this when our verify pipeline flagged 50+ "mismatches" between our replica and the comparison reference; tracing each one led us to the same retail-calendar pattern. Once the helper was applied, the cohort distribution closed cleanly.
Which companies trigger this
From our coverage, retail-calendar filers with end-dates in the first 7 days of January include:
- Walmart (WMT) — fiscal year ends January 31
- Costco (COST) — fiscal year ends late August (NOT a retail calendar; included as a counter-example)
- Target (TGT) — fiscal year ends late January / early February
- Home Depot (HD) — fiscal year ends late January
- Lowe's (LOW) — fiscal year ends late January / early February
- Best Buy (BBY) — fiscal year ends late January / early February
- TJX Companies (TJX) — fiscal year ends late January
- Ross Stores (ROST) — fiscal year ends late January
- Burlington Stores (BURL) — fiscal year ends late January
- Williams-Sonoma (WSM) — fiscal year ends late January
- Tractor Supply (TSCO) — fiscal year ends late December (NOT retail; counter-example)
- Lennar (LEN) — fiscal year ends late November (NOT retail; counter-example)
- ...and ~40 more retailers, including most apparel and big-box names.
Note the counter-examples: Costco, Tractor Supply, Lennar end on
non-January boundaries. Don't assume "retailer = 52-week fiscal
calendar" — many retailers do, but a chunk don't. The
_effective_fy helper handles both correctly because it
keys on end-date, not company.
Why this matters for backtests
The off-by-one fiscal year corrupts every cohort-based screen that
uses fy as a join key. Concretely, before the fix:
- Walmart's FY2017 revenue (≈$485B per management) was joined against FY2018 fundamentals (≈$500B). Growth-rate calculations showed +3.1%. Actual growth was +0.8%.
- Best Buy's FY2017 same-store comp (-0.9% per management) was joined against FY2018 cohort. The cohort's average comp came in 0.3 percentage points off.
- Several signal triggers — including
dso_drift_severeandmargin_compression_severe— fired or didn't fire on the wrong year for every retail-calendar filer. The signals' published calibrations would be 5-10% noisier than they should be.
After applying _effective_fy, those drift cleanly. The
calibrations sharpen. The retail-cohort screens stop flagging the
wrong companies.
How to test your own pipeline
Pick three retail-calendar filers (Walmart, Home Depot, Best Buy). Pull their fiscal-year-tagged revenue values for FY2018 from your data vendor. Compare to:
- The 10-K's "Net sales" headline for the year your vendor labels FY2018. Should match within 1%.
- The same metric on Walmart's 10-K filings page — the "fiscal year ended" date should be January 31 or February 1, not December 31.
If your vendor's "FY2018 revenue" for Walmart matches Walmart's FY2017 10-K headline (~$485B) instead of FY2018 (~$500B), you have the bug. The fix is the five-line helper above; the cost is one afternoon of running it across your ratio cache and ETL.
For the open audit trail of how we caught and fixed this, see our validation framework post — fix #3 in the four-part series.
Related signals
Citations
- SEC EDGAR companyfacts API documentation
- Walmart 10-K fiscal year disclosure (January 31 fiscal year-end)
- Interactive Market Data internal: project_data_correctness_2026_04_22 (the full verify pass)