4.7. Time series forecasting#

4.7.1. Autocorrelation in time series with statsmodels#

I always found time series analysis fascinating. Especially, autocorrelation.

Autocorrelation is the same as correlation coefficient but it is calculated between a series and its lagging version. Here, lagging means shifting the series a few periods behind, so the present values can be compared to their past.

Autocorrelation can help discover great insights about the time series, such as:

  1. Trend - when a clear trend exists in time series, autocorrelation goes up or down as you further shift the series

  2. Seasonality - if autocorrelation goes up and down in fixed periods, seasonality exists in the series.

  3. Predictability - high autocorrelation suggests strong predictive power of the series, meaning you can train on the past samples to predict the future.

To make autocorrelation analysis easier, you can plot it using statsmodels. Below is an example autocorrelation plot of temperature in Celcius. As you would expect, there is a strong seasonality in the series, occuring at every 12 lags.

Advanced time series analysis article: https://bit.ly/3Pmt2qM

4.7.2. Cross validation in time series#

Cross validating using time series data is tricky. You can’t use traditional KFold because you will end up training on “future” samples and predicting on the “past”. Instead, use TimeSeriesSplit of Sklearn.

The syntax is the same as other CV estimators but with one major difference:

In each fold, the training indices will always come before the test indices. So, each successive train set is a superset of previous sets.

4.7.3. Cyclic and seasonal time series patterns#

How to spot cyclic and seasonal time series patterns from a mile away?

If the pattern is repeating with a fixed period frequency or connected to the calendar in some way, the pattern is seasonal or periodic. Examples are temperature between seasons, retail sales, economic data, etc.

If the ups and downs of the series are irregular and resemble random fluctuations, the pattern is cyclic. Usually, the duration of these fluctuations last at least 2 years and you can’t reasonably predict when the next spike will occur based on the previous ones.

Cyclic patterns are usually associated with four phases of the business cycle - prosperity or boom, recession, depression and recovery.