Accruals, out of sample, 2016 to 2026

The claim

Sloan’s 1996 paper in The Accounting Review argued that the market underweights the distinction between the cash and accruals components of earnings. Firms whose earnings are backed by operating cash flow generate persistent future earnings; firms whose earnings depend heavily on accruals (deferrals, accrued receivables, inventory buildup, and similar non-cash items) generate earnings that are less persistent and tend to reverse. Investors who ignore the distinction are surprised by the reversal, and the paper documents that a trading rule of long low-accruals firms, short high-accruals firms earns significant abnormal returns.

The canonical magnitude reported in Sloan’s original 1962-1991 sample on NYSE/AMEX/NASDAQ is approximately 10.4% per year on the decile long-short, which is roughly 0.87% per month — right in the neighborhood of the Jegadeesh-Titman momentum canonical (Sloan 1996, as summarized in Dechow-Khimich-Sloan’s SSRN review and Quantpedia’s accrual anomaly page; the primary JSTOR PDF was 403 at time of this verdict, so the headline magnitude is cited from secondary sources per Nullberg’s math-verification rule).

What we tested

The goal is to see whether the accruals anomaly still produces a significant long-short spread in the 2016-2026 US equity out-of-sample window, with the modern cash-flow-statement-based accruals measure rather than Sloan’s original balance-sheet formula.

Sample and data

2016-01-05 to 2026-04-09 daily closes, 123 usable formation months in the primary
Cash-flow and balance-sheet fundamentals from FMP (cash_flows.parquet 483K rows, balance_sheets.parquet 485K rows)
8,479-row FMP company profile snapshot (exchange, sector, marketCap, ETF/fund flags)
Stocks-only (NYSE + NASDAQ + AMEX; no ETFs, no funds, no OTC/CBOE/PNK)
Financials and real estate excluded (Fama-French SIC 6 convention)

Accruals definition — Hribar and Collins (2002) cash-flow-statement approach:

accruals_cfs = Net Income − Cash Flow from Operating Activities

Hribar and Collins showed that this cash-flow-statement definition is cleaner than Sloan’s original balance-sheet formula (ΔCA − ΔCL − ΔCash + ΔSTDebt − Dep) / Assets, because balance-sheet changes include non-operating distortions (M&A, divestitures, discontinued operations) that corrupt the accruals measure. The two definitions are highly correlated in practice, and the modern factor literature uses the cash-flow-statement version. Nullberg uses it here as the primary.

Sort score — TTM cash-flow-statement accruals divided by same-period total assets:

gp_a_accruals = TTM_accruals / totalAssets

Where TTM_accruals = rolling 4-quarter sum of quarterly (NI − CFO) per symbol. Fundamentals are joined to monthly returns via pandas.merge_asof on filingDate (strict point-in-time, no look-ahead).

Sign convention — Sloan’s long-short is D1 (low accruals) minus D10 (high accruals). Low-accruals firms have cash-backed earnings and are the “long” side; high-accruals firms have less-persistent earnings and are the “short” side. A positive mean is the paper’s predicted direction.

Four specifications reported in full:

PRIMARY (VW, NYSE breakpoints, stocks-only, financials excluded, TTM accruals, winsorized): the canonical specification most comparable to the original Sloan 1996 methodology. Value-weighted within deciles using FMP snapshot marketCap.
Sensitivity A (EW, filtered, winsorized, TTM accruals): equal-weighted with price ≥ $5 and dollar-volume ≥ $1M filter.
Sensitivity B (EW, filtered, winsorized, single-Q accruals score): same filter but using only the most recent quarterly accruals instead of TTM.
Sensitivity C (EW, no filter, no winsorization, TTM): the rawest specification.

Pre-registered verdict thresholds, paper-specific calibration for Sloan’s ~0.87% per month canonical:

Replicated: mean(D1 − D10) > 0.003 AND t > 2
Degraded: 0 < mean ≤ 0.003 AND t > 2
Failed: mean ≤ 0 OR t ≤ 2
Inconclusive: data quality issue

The 0.003 floor is approximately 35% of canonical, reflecting the well-documented post-publication decay in the accruals literature (Richardson-Sloan-Soliman-Tuna 2005, Green-Hand-Zhang 2017, Dechow-Khimich-Sloan 2011).

The numbers

Primary specification

Value-weighted, NYSE breakpoints, stocks-only, financials excluded, TTM accruals, winsorized.

Metric	Value
Sample months	123
Median stocks / month	2,348
D1 (low accruals, high quality) mean monthly	+2.912%
D10 (high accruals, low quality) mean monthly	+2.029%
D1 minus D10 mean monthly	+0.883%
i.i.d. t-statistic	+1.85
Newey-West 12-lag t-statistic	+1.74
Annualized Sharpe	—
Verdict (pre-registered rubric)	FAILED

The primary point estimate matches the canonical 0.87% per month almost exactly (0.883% vs the Sloan/Dechow canonical of ~0.87%). The mean clears the REPLICATED floor by a wide margin (0.88% ≫ 0.30%). What fails the rubric is the t-statistic: +1.85 i.i.d., +1.74 Newey-West. Both are just below the |2| threshold.

This is the most unusual “failed” verdict in the Nullberg archive so far: the magnitude is right at the canonical, but 123 months is just barely insufficient to resolve it at conventional significance with a VW-on-large-caps construction. In the language of statistics, the point estimate says “replicated” while the standard error says “we cannot yet tell”. The pre-registered rubric cares only about t > 2, so the verdict is FAILED.

Sensitivity A: EW filtered winsorized, TTM

Metric	Value
Median stocks / month	1,792
D1 - D10 mean monthly	+1.058%
i.i.d. t-statistic	+3.56
Newey-West 12-lag t-statistic	+3.81

Sensitivity A REPLICATES under both i.i.d. and Newey-West. The Newey-West t actually improves over the i.i.d. t (the monthly series has mildly negative autocorrelation). Mean is +1.06% per month, exceeding even the original Sloan canonical of ~0.87%.

Sensitivity B: EW filtered winsorized, single-Q accruals

Metric	Value
D1 - D10 mean monthly	+0.555%
i.i.d. t-statistic	+2.34
Newey-West 12-lag t-statistic	+2.39

Using single-quarter accruals instead of TTM (so the score is more reactive to the most recent reported earnings) also REPLICATES under both i.i.d. and Newey-West, though with a smaller magnitude than TTM. Both |t|-statistics clear 2.

Sensitivity C: EW no filter no winsor, TTM

Metric	Value
D1 - D10 mean monthly	+2.071%
i.i.d. t-statistic	+1.83
Newey-West 12-lag t-statistic	+1.90

The rawest specification produces the largest mean but has elevated variance from the microcap tail, pulling the t-stat back below 2. FAILED per rubric.

What this means

The accruals verdict is genuinely different in shape from the first four verdicts. The VW NYSE primary is underpowered at the exact canonical magnitude, while both EW sensitivities REPLICATE robustly under both i.i.d. and Newey-West tests. This is the opposite of the profitability and value patterns, where the large-cap VW specification carried the effect and EW mid-cap portfolios were flat.

The honest interpretation is that the accruals effect has migrated to smaller stocks in the 2016-2026 regime. In Sloan’s original 1962-1991 sample, accruals worked on value-weighted large-cap portfolios — it was a large-cap effect. In the post-2016 regime, it appears to be a smaller-stock earnings-quality effect, consistent with the Green-Hand-Zhang 2017 factor zoo literature that shows the accrual anomaly has shifted toward less liquid segments of the market. One plausible mechanism: large-cap stocks have dense analyst coverage that has learned to price the accruals distinction cleanly, while smaller stocks remain mispriced by the same heuristic that Sloan originally identified.

By the pre-registered rubric, the primary is FAILED. A reader who uses a decision rule of “any specification has to clear significance” would call this REPLICATED, because two of four do. Nullberg’s rubric is specifically the primary spec, so the headline is FAILED — but the page reports every sensitivity in full so the reader can apply their own standard.

Comparative picture across five verdicts

Paper	Factor class	Primary mean	i.i.d. t	NW t	Verdict	Shape
MAX, Bali et al. 2011	Lottery	-1.80%	-2.57	-2.18	Failed	Sign inverted
Momentum, JT 1993	Trend	-0.13%	-0.16	-0.26	Failed	Decayed to null
Profitability, Novy-Marx 2013	Quality	+0.50%	+0.93	+1.05	Failed	Underpowered, below canonical
Value, FF 1992	Value	+1.70%	+2.82	+1.94	Replicated	Regime-driven, 2022 rebound
Accruals, Sloan 1996	Earnings quality	+0.88%	+1.85	+1.74	Failed	Primary at canonical; EW sensitivities replicate

Five verdicts, five distinct shapes. The rubric has now produced four FAILED and one REPLICATED, and within the four FAILEDs there are four economically different failure modes.

Reproducibility

Script: scripts/verdicts/sloan_1996_accruals.py
Results JSON: scripts/verdicts/sloan_1996_accruals.results.json
Monthly LS CSVs: ..._primary.csv, ..._sensA.csv, ..._sensB.csv, ..._sensC.csv

What we will track from here

This verdict enters the archive as failed and is reviewed when at least one of the following happens:

A forward extension of the sample pushes the primary i.i.d. t above 2 at the current point estimate. At +0.88% per month and the current std, approximately 40 additional monthly observations would do it.
A small-cap-tilted or size-neutral specification shows whether the migration-to-smaller-stocks hypothesis is real. A 2×3 size × accruals sort is a natural next test.
The original Sloan balance-sheet accruals definition, computed by hand from balance-sheet changes, produces a materially different result.

Bibliography

Sloan, Richard G. “Do Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future Earnings?” The Accounting Review 71(3), 1996, pp. 289-315. Paper
Hribar, Paul, and Daniel W. Collins. “Errors in Estimating Accruals: Implications for Empirical Research.” Journal of Accounting Research 40(1), 2002, pp. 105-134. Paper — the cash-flow-statement definition used for this verdict.
Newey, Whitney K., and Kenneth D. West. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55(3), 1987, pp. 703-708. HAC estimator used for the Newey-West t-statistics.