Definition
Data Snooping Bias
Data snooping bias is the statistical distortion that arises when many strategies or parameters are tested on the same dataset, making it likely that some appear profitable purely by chance.
If an Indian researcher tries hundreds of moving-average combinations on the Nifty, a few will look spectacular by luck alone. Reporting only the best one, without accounting for all the failed attempts, overstates the true edge, a problem closely tied to overfitting.
Guarding against data snooping requires honest record-keeping of every test, statistical corrections for multiple comparisons, and validation on truly fresh data. The deflated Sharpe ratio and similar adjustments penalise results discovered after extensive searching, helping separate skill from selection effects.
Related terms
- BacktestingBacktesting is the process of simulating a trading strategy on historical data to estimate how it would have performed, including returns, drawdowns and risk, before committing real capital.
- Out-of-Sample TestingOut-of-sample testing evaluates a strategy on data that was deliberately withheld during model development, providing an unbiased check of whether the discovered edge generalises beyond the fitting period.
- OverfittingOverfitting, or curve-fitting, occurs when a strategy is tuned so closely to historical data that it captures random noise rather than a genuine pattern, and consequently fails on new data.
- Sharpe Ratio OptimisationSharpe ratio optimisation is the process of constructing or tuning a portfolio or strategy to maximise return per unit of risk, measured as excess return divided by volatility.
Plain-English explainer from The Dispatch Investors Encyclopedia. General information, not financial advice.