A common belief in strategy design is that ‘more data is better.’ But is this always true? Reference [1] examined the impact of the quantity of data in predicting realized volatility. Specifically, it focused on the accuracy of volatility forecasts as a function of data sampling frequency. The study was conducted on crude oil and it used GARCH as the volatility forecast method. The author pointed out,
The cause-and-effect aspect of the relationship between sampling frequency and forecasting accuracy was assessed in-sample and out-of-sample. Regarding the in-sample assessment, I was able to find evidence that sampling frequency affected how well the model fit. The relationship in this case was that the higher the sampling frequency, the better the model fit. Regarding the out- of-sample assessment, evidence was found that sampling frequency had an effect on forecasting accuracy, albeit in a surprising way. The relationship found in this study is that increasing sampling frequency negatively affects modelling accuracy…
The results of the regression analysis showed that sampling frequency accounted for around 20- 25% of the variability in the error metrics. From the illustration of the data research method in Figure 1. it is also clear that there is an opening for the inclusion of other research fields.
In short, increasing the data sampling frequency improves in-sample prediction accuracy. However, higher sampling frequency actually decreases out-of-sample prediction accuracy.
This result is surprising, and the author provided some explanation for this counterintuitive outcome. In my opinion, financial time series are usually noisy, so using more data isn’t necessarily better because it can amplify the noise.
Another important insight from the article is the importance of performing out-of-sample testing, as the results can differ, sometimes even contradict the in-sample outcomes.
Let us know what you think in the comments below or in the discussion forum.
References
[1] Hervé N. Mugemana, Evaluating the impact of sampling frequency on volatility forecast accuracy, 2024, Inland Norway University of Applied Sciences
Further questions
What's your question? Ask it in the discussion forum
Have an answer to the questions below? Post it here or in the forum
It was a Netflix documentary Michael Nantais watched during the early months of the pandemic that cemented his love of sustainable farming. Nantais, who grew up just outside of Montreal in Pointe-Claire, started experimenting in his mother’s backyard_ growing kale, cucumbers, and zucchinis for the…
Can you comment as to the mechanics and / or implications of this?
Upon reading the paper (and hopefully more than partially understanding it), it makes sense that out of sample accuracy decreases a n increases. Volatility “clusters” and more recent data is going to have a higher effect on what volatility will be tomorrow. By increasing N you are bringing in periods where volatility not as applicable.
Volatility begets volatility…right up until it doesn’t. The recent yen carry blow up effecting US equities, the surge, spike, and fall of volatility demonstrates this. The volatility changes market participants psychology, risk limits, etc and forces selling until the process is overblown and it starts to revert.
By modelling volatility of data 3 months and not higher weighting to recent stuff, your early August vol predictions will be too low. Once could actually see slowly surging vol than the explosion up from July 18 to Jul 31.
What I would like to know, and I believe other readers as well, is how do you choose to apply this real world investing?
You’re correct; there’s a flaw in using equally weighted historical volatility because older data is given the same weight as more recent data, which can distort the volatility estimate. The solution is to use exponentially weighted historical volatility. I posted about this here:
https://harbourfronts.com/exponentially-weighted-historical-volatility-in-excel-volatility-analysis-in-excel/
That said, the article discusses a different aspect of forecasting volatility: whether we should use daily data or more granular intraday data (e.g., 1-minute, 5-minute, or hourly) to estimate 1-month volatility. The out-of-sample results indicate that more granular data is not necessarily better.
Thanks for asking.