Why Backtests Decay: Regime Dependence and Crowding

Subscribe to newsletter

Backtesting is an essential part of quantitative strategy development, and naturally, strategies are often selected based on strong backtest performance. However, an important question when evaluating backtested strategies is how much of the results reflects skill versus luck.

Reference [1] examines this issue by analyzing 1,726 commercially marketed strategies from ten global institutions over the period 2009 to 2025, covering equities, rates, foreign exchange, credit, and commodities. Each strategy is classified into one of seven categories: Carry, Hedging, Momentum, Multi Premia, Factor, Value, or Liquidity. The author pointed out,

This paper examines how institutional allocators should interpret marketed backtests of structured investment strategies. The analysis contributes in three ways. First, it quantifies the gap between pro-forma and live performance on a uniquely large commercial sample of 1,726 strategies from ten global institutions over 2009–2025. Second, it shows that once live performance is measured against a leave-one-out bucket-average peer benchmark, the residual information content of the marketed backtest is economically negligible: what looks like strategy-specific skill is predominantly the common factor regime prevailing at launch. Third, it identifies two structural channels—regime timing at launch and a horizon-dependent launch-density effect—that jointly explain the residual decay, and translates the result into an operational rule: the haircut applied to a marketed backtest should increase with the extremity of the pre-launch factor regime.

Subscribe to newsletter https://harbourfrontquant.substack.com/ Newsletter Covering Trading Strategies, Risk Management, Financial Derivatives, Career Perspectives, and More

In summary, the results show that backtested strategies often experience significant performance decay in live trading, approximately 2% to 3% per year. Most of the backtested performance is driven by factor regimes rather than true skill, with regime timing and crowding identified as the main drivers of decay.

This has important implications for allocators and system developers, as strategies should be benchmarked against peers and adjusted for regime effects, given that backtests often reflect the environment rather than persistent alpha.

Let us know what you think in the comments below or in the discussion forum.

References

[1] Chang Liu (2026), Evaluating Structured Strategy Backtests: Peer Benchmarks, Regime Timing, and Live Performance, arXiv:2604.18821

Further questions

What's your question? Ask it in the discussion forum

Have an answer to the questions below? Post it here or in the forum

LATEST NEWSChina Bonds Buck Global Rout With Yields Hitting Nine-Month Low
China Bonds Buck Global Rout With Yields Hitting Nine-Month Low

China’s bonds are diverging further from peers as a fragile economic recovery and ample market liquidity keep local yields anchored despite a global debt selloff.

Stay up-to-date with the latest news - click here
LATEST NEWSMosaic Company’s SWOT analysis: fertilizer stock faces pricing pressure
Mosaic Company’s SWOT analysis: fertilizer stock faces pricing pressure
Stay up-to-date with the latest news - click here
LATEST NEWSMorning Bid: As stocks slump, cue Nvidia
Morning Bid: As stocks slump, cue Nvidia
Stay up-to-date with the latest news - click here
LATEST NEWSSingapore urges financial firms to use AI to create better jobs
Singapore urges financial firms to use AI to create better jobs
Stay up-to-date with the latest news - click here
LATEST NEWSIndonesia’s Prabowo announces 2027 fiscal deficit target of 1.8% to 2.4% of GDP
Indonesia’s Prabowo announces 2027 fiscal deficit target of 1.8% to 2.4% of GDP
Stay up-to-date with the latest news - click here

Leave a Reply