Failed Verdict

Momentum, out of sample, 2017 to 2026

The original abstract says past winners beat past losers by a significant margin over three- to twelve-month holding periods. On 111 months of US equity data, value-weighted with NYSE breakpoints and stocks-only filtering, the winners-minus-losers spread is -0.13% per month with t = -0.16. The effect has decayed to zero in the standard spec and only approaches significance in one shorter-lookback equal-weighted sensitivity at t = +1.45. All four specifications fail to reach significance. Backfill update 2026-04-11: Newey-West 12-lag t is -0.26 on the primary, and both sub-sample halves are near-null (+0.02% and -0.25%), confirming the decay is consistent across time rather than regime-driven.

Nullberg verdict, replication, factor, momentum

Source paper

Jegadeesh and Titman (1993) "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency"

The claim

Jegadeesh and Titman (1993) published one of the most cited papers in empirical asset pricing. The paper’s core claim, verbatim from the abstract:

“This paper documents that strategies that buy stocks that have performed well in the past and sell stocks that have performed poorly in the past generate significant positive returns over three- to twelve-month holding periods.”

Their sample was NYSE and AMEX, 1965 to 1989 (Jegadeesh and Titman 1993). Secondary sources summarizing Table I report the cross-sectional long-short spread at roughly 1 to 1.5% per month for the various J-month lookback and K-month holding period combinations covered in the paper.

What we tested

This is the second Nullberg verdict. The first, on the MAX factor of Bali, Cakici, Whitelaw 2011, found a statistically significant inversion of the published effect in the same 2016-2026 out-of-sample window. Momentum is a completely different factor: it uses a long (6 to 12 month) lookback to identify trending stocks, rather than looking at maximum daily returns over a single month. If the replication crisis is really a replication crisis, different factors should fail for different reasons, not all in the same way. That is the question this verdict is designed to address.

Sample

  • 2016-01-05 to 2026-04-09 daily closes (~10 years), 122 calendar months
  • 5,568 US equities in the raw price cache, merged with 8,479 rows of FMP company profile data for exchange listing, market cap, and ETF/fund flags. After filtering to stocks-only on main exchanges (NYSE, NASDAQ, AMEX; excluding OTC, CBOE, PNK, ETFs, and funds), the primary spec sorts a median of 2,963 stocks per month.
  • A note on the original universe: Jegadeesh and Titman used NYSE and AMEX only, explicitly excluding NASDAQ. Our inclusion of NASDAQ is a disclosed out-of-sample broadening. If anything, including NASDAQ should make the test easier to pass because post-2016 NASDAQ has been a momentum-friendly market.

Momentum score

At the end of each formation month t, the score for each stock is the cumulative simple return over an J-month window ending skip months before t. The standard modern specification is J=11, skip=1 (also called “12-2 momentum” or “11 skipping 1”), which means the score uses months t-12 through t-2 inclusive. The 1-month skip avoids contamination from short-term reversal effects documented by Jegadeesh 1990. Dense monthly grids are built per symbol with forward-filled end-of-month closes so the positional shift handles any coverage gaps correctly. Earliest valid formation month for the primary spec is January 2017 (12 full months of prior history by that date).

Methodology

  1. At the end of each month t, rank stocks by their momentum score
  2. Assign stocks to deciles (1 = losers, 10 = winners) using the NYSE-listed subset of the universe as the breakpoint sample
  3. Apply those NYSE breakpoints to the full NYSE+NASDAQ+AMEX universe
  4. Hold each decile portfolio for one month (K=1). Rebalance at the next month end.
  5. Compute D10 (winners) minus D1 (losers) as the long-short factor return
  6. Report mean, standard deviation, t-statistic, and Sharpe of the monthly long-short series

Note the sign convention: momentum’s long-short is D10 minus D1 (winners minus losers), the opposite of the MAX verdict. A positive mean is the paper’s predicted direction.

Four specifications reported in full:

  • PRIMARY (value-weighted, NYSE breakpoints, stocks-only, J=11 skip-1, winsorized). Closest analog to the original methodology. Value-weighted within deciles using snapshot FMP marketCap. Stock-level monthly returns winsorized at [-0.90, +1.50].
  • Sensitivity A (equal-weighted, filtered, winsorized, J=11 skip-1). Same momentum score, equal-weighted with the standard price ≥ $5 and dollar volume ≥ $1M filter.
  • Sensitivity B (equal-weighted, filtered, winsorized, J=6 skip-1). Shorter 6-month lookback. One of the J,K combinations directly covered in the original Table I.
  • Sensitivity C (equal-weighted, filtered, winsorized, J=12 K=1, no skip). 12-month lookback without the 1-month skip, exposing the portfolio to the short-term reversal effect.

Pre-registered verdict thresholds (committed before the script was first run)

  • Replicated: mean(D10 - D1) > 0.007 AND t > 2
  • Degraded: 0 < mean ≤ 0.007 AND t > 2
  • Failed: mean ≤ 0 OR t ≤ 2
  • Inconclusive: a data quality issue prevents a clean call

The 0.007 floor is 70% of the canonical ~1% per month magnitude reported in secondary sources summarizing Table I. It is the same floor the MAX verdict used.

The numbers

Primary specification

Value-weighted, NYSE breakpoints, stocks-only, J=11 skip-1, winsorized.

MetricValue
Sample months111
Median stocks / month2,963
D1 (losers) mean monthly return+2.581%
D10 (winners) mean monthly return+2.447%
D10 minus D1 mean monthly-0.134%
D10 minus D1 t-statistic-0.16
Annualized Sharpe of D10 minus D1-0.05
Worst month-35.51%
Best month+25.20%

The primary is a null result. D10 and D1 have nearly identical means, the long-short spread is indistinguishable from zero, and the t-statistic is at the noise level. This is not an inversion, it is a flatline.

Pre-registered call: mean ≤ 0 AND t ≤ 2, so the verdict is FAILED.

Sensitivity A (equal-weighted, filtered, J=11 skip-1)

MetricValue
Median stocks / month2,490
D1 mean monthly+1.845%
D10 mean monthly+1.893%
D10 minus D1 mean monthly+0.048%
t-statistic+0.10

Equal-weighting preserves the null. The spread goes from barely negative to barely positive when we switch from VW to EW, but both are economically and statistically indistinguishable from zero. This is not “the sign flips under EW”, it is “the spread is zero either way”.

Sensitivity B (equal-weighted, filtered, J=6 skip-1)

MetricValue
Median stocks / month2,531
D1 mean monthly+1.359%
D10 mean monthly+1.974%
D10 minus D1 mean monthly+0.615%
t-statistic+1.45

This is the most interesting specification. Shortening the lookback from 11 months to 6 months pushes the winners-minus-losers spread to a meaningful positive number (~7.4% per year on the long-short) with a t-statistic that clears the one-tailed 10% bar but not the conventional two-tailed |2| bar. The direction is right, the magnitude is non-trivial, and the significance is within reach.

Pre-registered call: t ≤ 2, so still FAILED by the rubric. But the 6-month spec deserves its own flag in the archive because it is the closest any of the four to surviving and it goes in the right direction.

Sensitivity C (equal-weighted, filtered, J=12 no skip)

MetricValue
Median stocks / month2,490
D1 mean monthly+2.043%
D10 mean monthly+1.755%
D10 minus D1 mean monthly-0.288%
t-statistic-0.57

Removing the 1-month skip reintroduces short-term reversal contamination and pulls the spread slightly negative. This is consistent with the reason Jegadeesh and Titman adopted the skip convention in later literature: the most recent month of returns tends to reverse, and including it in a trending score partially cancels out the trend signal.

What this means

Momentum has not inverted. It has decayed.

The original Jegadeesh and Titman result was ~1% per month, highly significant, across a wide range of J,K combinations, in 1965-1989 NYSE+AMEX data. Our 2017-2026 run on the closest analog the available data supports shows:

  • The value-weighted NYSE-breakpoint primary is indistinguishable from zero. Both means are near 2.5% per month (reflecting the 2017-2026 bull market), and their difference is -0.13% with t = -0.16.
  • The equal-weighted version of the same spec is also zero.
  • Only the 6-month skip-1 equal-weighted variant shows a positive spread (+0.62% per month) and even that is at t = +1.45, below the conventional significance bar.
  • The 12-month no-skip variant goes slightly negative due to short-term reversal contamination, as expected from the literature.

The honest read is that momentum has materially weakened rather than disappeared. The direction is right in three of four specs (zero, zero, +0.62%), the magnitudes are far below the canonical ~1% per month, and no specification reaches the significance bar the original paper comfortably cleared.

Three contextual notes that the reader deserves:

  1. Different story from MAX. The first Nullberg verdict found MAX’s lottery-aversion effect had inverted with a highly significant t-stat. This second verdict finds momentum’s winners-beat-losers effect has decayed to near-zero, with no specification clearing significance in either direction. Two different factors, two different kinds of failure, one consistent conclusion: the 2017-2026 regime is not the 1965-1989 regime, and published anomalies cannot simply be copy-pasted across it.
  2. 6-month is the survivor candidate. The J=6 skip-1 equal-weighted specification is the only one that survives in the right direction at a non-trivial magnitude. Recent literature (the 30-years review by van Vliet et al. 2024) has pointed to shorter lookback momentum as the more robust variant post-2010. Our run is directionally consistent with that but without enough statistical power to land it cleanly.
  3. This is not a refutation of the original paper. The 1965-1989 result is not in dispute. What our run shows is that if a reader uses the 1993 Table I verbatim as a buy-the-winners-sell-the-losers trading rule on the 2017-2026 US equity universe, they do not get the 1% per month that Table I promised. Any of the weak variants would have been whip-sawed by the enormous gross means on both D1 and D10 during the bull market, with the long-short spread providing essentially zero net alpha.

By the pre-registered rubric, this is a failed replication across all four specifications. The verdict goes into the archive as failed and will be updated when the primary numbers move, when a finer test distinguishes 6-month from 12-month survivors, or when market regime turns enough to retest the long side.

Reproducibility

The replication is a single Python file with no custom dependencies beyond pandas and numpy. It reads the operator’s 10-year daily OHLCV cache plus a parquet of FMP company profiles, computes monthly momentum scores in a single pass per symbol using a dense period grid with forward-filled closes, merges exchange and market cap, forms NYSE-breakpoint deciles, and runs all four specifications. Total runtime on a laptop is about 32 seconds.

  • Script: scripts/verdicts/jegadeesh_titman_1993_momentum.py
  • Results JSON: scripts/verdicts/jegadeesh_titman_1993_momentum.results.json
  • Monthly long-short series CSVs: ..._primary.csv, ..._sensA.csv, ..._sensB.csv, ..._sensC.csv

What we will track from here

This verdict enters the living archive as failed and stays there until at least one of the following happens. If it does, the entry is updated, a dated changelog is appended, and the old call is kept visible.

  1. The 6-month skip-1 specification’s t-statistic clears |2| in a forward extension of the sample or on a sub-universe where the effect is cleaner.
  2. A strict time-varying value-weighted run on CRSP-quality data with properly updated monthly market caps reverses the primary’s null result.
  3. A Newey-West or bootstrap-based standard error materially changes the t-stats from the simple i.i.d. estimate reported here.
  4. Additional robustness tests (sub-sample stability, industry-controlled, volatility-controlled, size-decile-controlled) materially change the conclusion.

Update — 2026-04-11 — Newey-West and sub-sample stability backfill

Applying the backfill analysis committed in the Four verdicts, four shapes primer to the primary (VW NYSE-breakpoint, J=11 skip-1) monthly long-short series.

Newey-West robustness

MetricValue
Sample months111
Mean monthly D10 - D1-0.134%
i.i.d. t-statistic-0.16
Newey-West 12-lag t-statistic-0.26

The Newey-West adjusted t is very slightly more negative than the i.i.d. t, which in this direction means the long-run variance estimate is smaller than the i.i.d. variance estimate (the summed lag-autocovariances are mildly negative on net). Either way the number is indistinguishable from zero. The “decayed to null” characterization is unchanged under the HAC adjustment.

Sub-sample stability

HalfMonthsFirst monthLast monthMean monthlyi.i.d. t
First half552017-012021-07-0.021%-0.01
Second half562021-082026-03-0.245%-0.24

Both halves are null. The first half is essentially zero (mean -0.02% per month, t = -0.01). The second half is slightly more negative but still statistically indistinguishable from zero. Unlike the value factor, which has one flat half and one significant half, momentum’s decay is consistent across time. A researcher running this same specification in July 2021 would have returned the same FAILED verdict as a researcher running it in April 2026.

What this sharpens

The synthesis primer characterized momentum as “decayed to null”. The backfill confirms that characterization is not a full-sample artifact of averaging two opposite regimes: both halves of the sample are individually null. If momentum is coming back, it is not doing so in any sub-window of the current 10-year cache.

Verdict impact

No change. Pre-registered rubric was mean ≤ 0 OR t ≤ 2 → FAILED. The primary mean is -0.134% (≤ 0), so the verdict is FAILED regardless of the t-statistic. The backfill additional evidence is that the failure is (a) robust to Newey-West and (b) consistent across sub-samples, which is a stronger form of FAILED than one that depended on a single sample window.

Bibliography

  1. Jegadeesh, Narasimhan, and Sheridan Titman. “Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency.” Journal of Finance 48(1), 1993, pp. 65-91. Paper
  2. Bali, Turan G., Nusret Cakici, and Robert F. Whitelaw. “Maxing Out: Stocks as Lotteries and the Cross-Section of Expected Returns.” Journal of Financial Economics 99(2), 2011, pp. 427-446. Paper — prior Nullberg verdict here
  3. Newey, Whitney K., and Kenneth D. West. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55(3), 1987, pp. 703-708. HAC estimator used in the backfill update.