ARIMA Forecasting

ARIMA Summary

Summary: ARIMA which stands for Autoregressive Integrated Moving Average helps you forecast data for seasonal and nonseasonal data

STEP 1: TIME SERIES DECOMPOSITION PLOT A time series decomposition plot allows you to observe the seasonality, trend, and error/remainder terms of a time series.

Useful Alteryx tool: TS Plot

STEP 2: DETERMINE THE ARIMA TERMS Nonseasonal ARIMA models are displayed in the terms (p,d,q) which stand for p - periods to lag for, d - number of transformations used to make the data stationary, q - lags of the error component

Stationary - mean and variance are constant over time vs Non-Stationary - mean and variance change over time

Differencing - take the value in the current period and subtract it by the value from the previous period. You might have to do this several times to make the data stationary. This is the Integrated component which is d in the model terms.

Autocorrelation - How correlated a time series is with its past values, if positive at Lag-1 then AR if negative then MA

Partial Autocorrelation - The correlation between 2 variables controlling for the values of another set of variables. If the partial autocorrelation drops of quickly then AR terms, if it slowly decays then MA Seasonal ARIMA models are denoted (p,d,q)(P,D,Q)m

These models may require seasonal differencing in addition to non-seasonal differencing. Seasonal differencing is when you subtract the value from a year previous of the current value.

Useful Alteryx tool: TS Plot

STEP 3: BUILD AND VALIDATE THE ARIMA MODEL Build the ARIMA model using the terms determined in step 2. You can use internal and external validation to validate the quality of the model.

Internal validation: Look at in-sample error measures, particularly RMSE (Root-Mean-Square Error) and MASE (Mean Absolute Scaled Error).

External validation: Determine the accuracy measures by comparing the forecasted values with the holdout sample. This is especially important for comparing ARIMA models to other types of models, such as ETS.

Pick the ARIMA model with lowest AIC value. If the AIC values are comparable, use calculated errors to pick one that minimizes error the most. Many software tools will automate the selection of the model by minimizing AIC.

Useful Alteryx tools: ARIMA, TS Compare

STEP 4: FORECAST! Use the best ARIMA model to forecast for the desired time period. Make sure to add the holdout sample back into the model. Plot the results along with 80% and 95% confidence intervals.

Useful Alteryx tool: TS Forecast

Visualize

For this example, I’ll load a .csv file to a dataframe. The dataframe will need converted to a time series. Then we can plot the time series to get a first look. This will give us an immediate and intuitive look at potential seasonality and trend.

library(xts)
library(dygraphs)

step_data <- read.csv("fb_steps.csv", header=TRUE)
step_ts <- xts(step_data[,-1], order.by=as.Date(step_data[,1], "%m/%d/%Y"), frequency=7)

dygraph(step_ts, main = "Daily Steps") %>%
  dyRangeSelector()

Stationarize

Ensure a constant mean, variance, and covariance of nth term. - Detrending / Differencing - Dickey Fuller test of Stationarity - This data has a weekly trend, so we will difference vs. the 7 day lag

Run

library(tseries)
library(forecast)

single_diff <- diff(log(step_ts),lag= 7)

dygraph(single_diff, main = "Daily Steps - differenced") %>%
  dyRangeSelector()

Plot ACF/PACF

Determine optimal parameters. AR or MA series?

Plot of non-stationary:

Acf(step_ts)

Pacf(step_ts)

Plot of single diff:

ggAcf(single_diff)

ggPacf(single_diff)

Build model

Positive correlation at lag1 = AR 1 difference gradual decay in pacf = MA

fit <- arima(step_ts, c(1, 1, 0),seasonal = list(order = c(1, 1, 0), period = 7))

Make predictions

pred <- predict(fit, n.ahead = 30)

Using Auto-Arima

Returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible model within the order constraints provided.

auto_pred <- auto.arima(step_ts)

#pred = predict(auto_pred,n.ahead=14)

#auto_pred
#plot(forecast(auto_pred))


#use summary to get the model fit by forecast package
pred_df <- summary(forecast(auto_pred, h=10))
## 
## Forecast method: ARIMA(1,0,2) with non-zero mean
## 
## Model Information:
## Series: step_ts 
## ARIMA(1,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1      ma2       mean
##       0.9471  -0.7398  -0.0584  8502.4318
## s.e.  0.0141   0.0294   0.0261   258.5981
## 
## sigma^2 estimated as 7037288:  log likelihood=-13933.07
## AIC=27876.14   AICc=27876.18   BIC=27902.7
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -2.307862 2649.245 1956.252 -9.294898 25.11213 0.8266592
##                     ACF1
## Training set 0.001190209
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1499       7671.147 4271.462 11070.83 2471.777 12870.52
## 1500       7909.556 4437.645 11381.47 2599.727 13219.38
## 1501       7940.941 4437.522 11444.36 2582.925 13298.96
## 1502       7970.665 4439.225 11502.11 2569.793 13371.54
## 1503       7998.816 4442.430 11555.20 2559.793 13437.84
## 1504       8025.477 4446.864 11604.09 2552.462 13498.49
## 1505       8050.726 4452.295 11649.16 2547.401 13554.05
## 1506       8074.638 4458.524 11690.75 2544.269 13605.01
## 1507       8097.285 4465.383 11729.19 2542.771 13651.80
## 1508       8118.732 4472.729 11764.74 2542.651 13694.81
pred_df <- data.frame(cbind(c('2016-06-25','2016-06-26','2016-06-27','2016-06-28',
                   '2016-06-29','2016-06-30','2016-07-01','2016-07-02',
                   '2016-07-03','2016-07-04','2016-07-05','2016-07-06',
                   '2016-07-07','2016-07-08'), pred_df$`Point Forecast`))

#convert into a Time-Series class
pred_dfTs <- xts(pred_df[,-1], order.by=as.Date(pred_df[,1]), frequency = 7)

#plot

dygraph(pred_dfTs, main = "Daily Steps - AutoArima") %>%
  dyRangeSelector()
combined <- cbind(pred_dfTs, actual=step_ts)


dygraph(combined, main = "Daily Steps - AutoArima") %>%
  dyRangeSelector()