# 5 Autoregressive Integrated Moving Average (ARIMA) Models

The goal of this lecture is to implement Box–Jenkins methodology for nonstationary time series, by applying differences and fitting an ARMA model to the stationary differenced series. You should be able to recognize when the differencing is needed based on the time series and ACF plots.

**Objectives**

- List the steps of Box–Jenkins methodology.
- Identify the order of differences \(d\) and order of seasonal differences \(D\) from the plots of original and differenced time series.
- Recall how the orders \(p\) and \(q\) are identified using ACF and PACF plots for non-seasonal time series.
- Identify the orders \(P\) and \(Q\) for seasonal time series.

**Reading materials**

- Chapters 6.1–6.5 in Brockwell and Davis (2002)

## 5.1 Introduction to ARIMA

We have already discussed the class of ARMA models for representing stationary series. A generalization of this class, which incorporates a wide range of nonstationary series, is provided by the *autoregressive integrated moving average* (ARIMA) processes, i.e., processes which, after differencing finitely many times, reduce to ARMA processes.

If \(d\) is a nonnegative integer, then \(X_t\) is an ARIMA(\(p, d, q\)) process if \(Y_t=(1 - B)^d X_t\) is a causal ARMA(\(p, q\)) process. The process is stationary if \(d = 0\), in which case it reduces to an ARMA(\(p, q\)) process.

For example \(X_t\) is an ARIMA(1,1,0) process, then \(Y_t\) representing the series of its first-order differences (because \(d = 1\)) is an ARMA(1,0) process \[ Y_t = (1 - B)X_t = \phi_1 Y_{t-1} + Z_t, \] where \(|\phi_1| < 1\) and \(Z_t\) is white noise.

An equation for ARIMA(\(p, d, q\)) is \[ (1 - B)^d (1 - \phi_1B - \dots - \phi_pB^p)X_t = (1 + \theta_1B + \dots + \theta_qB^q)Z_t, \tag{5.1}\] where \(B\) is the backshift operator, \(d\) is the order of differences, \(\phi_1, \dots, \phi_p\) are autoregression coefficients, \(p\) is the autoregressive order, \(\theta_1, \dots, \theta_q\) are moving average coefficients, \(q\) is the moving average order, and \(Z_t\sim \mathrm{WN}(0,\sigma^2)\). The left part of Equation 5.1 consists of the differences and the AR part; the right part represents the MA part.

**Why the process is called ‘integrated’?**

Recall the geometric interpretation of the integral of a curve \(y = f(x)\) defined on continuous \(x\). The integral of \(y\) corresponds to the area under the curve. For example, if \(y = f(x)\) is a function describing income \(y\) based on working time \(x\), the integral corresponds to the total income in a certain period.

In our lectures, we deal with time series defined using discrete times \(t\) (for example, years), so yearly income is \(Y_t\), and the total income over several years can be obtained by summing the individual annual incomes, \(\sum Y_t\). Hence, here we integrate by summation.

Recall the definition of random walk series, which is a cumulative sum of i.i.d. noise: \[ X_t = \sum_{i=1}^t Y_i, \] where \(Y_t \sim\) i.i.d.(\(0,\sigma^2\)). This \(X_t\) is the simplest example of integrated series. The notation \(X_t \sim\) I(1) means \(X_t\) is a first-order integrated series. The differences of \(X_t\) \[ \begin{align} (1 - B)X_t &= X_t - BX_t \\ &= X_t - X_{t-1} \\ &= Y_t \end{align} \] give us back the uncorrelated series \(Y_t\), hence the process \(X_t\) is an ARIMA(0,1,0) process.

## 5.2 Box–Jenkins methodology

The approach developed by Box and Jenkins (1976) applies the differencing to the original time series repeatedly, until a stationary time series is obtained. We will first learn how to identify the number of differences (i.e., the order of differences \(d\)) by analyzing plots of the time series and ACF at each iteration of the method. In the later lecture on detecting trends in time series, we will also introduce formal tests for the integration order.

Below is a general algorithm used to fit ARIMA(\(p, d, q\)) models

- Start assuming \(d = 0\).
- Plot the time series and ACF.
- If the plots suggest nonstationarity, iterate differencing and plotting to update \(d\):
- Apply differences of the current time series, write \(d = d + 1\)
- Plot the time series and ACF
- If nonstationarity is still obvious, repeat the differencing and plotting

- Identify the orders \(p\) and \(q\) from the plots of ACF and PACF of the latest (differenced) time series.
- Estimate the model.
- Apply model diagnostics, particularly for homogeneity, uncorrelatedness, and normality of residuals. Address the violations by respecifying the model.
- Forecast with the resultant model.

Using the Box–Jenkins methodology, the linear predictor approximately follows a normal distribution, i.e., \[ \hat{X}_{n+h} \sim N\left( X_{n+h}, \mathrm{var}(\hat{X}_{n+h}) \right). \]

Therefore, a \((100 - \alpha)\)% prediction interval is \[ \hat{X}_{n+h} \pm z_{1-\alpha/2} \sqrt{\mathrm{var}(\hat{X}_{n+h})}. \]

## 5.3 Seasonal ARIMA (SARIMA)

Recall that time series with seasonal variability or another strictly periodic component (e.g., daily cycles) can be deseasonalized by applying differencing that is not of consecutive values but with a lag equal to the period of the cyclical variability. We will use such differences to remove *strong* periodicity, as an additional step in the Box–Jenkins algorithm.

Similar to the regular differences, we will apply seasonal differences not to eliminate autocorrelations at the corresponding lags, but to achieve fast decay of ACF at seasonal lags. Even after the seasonal differences, there might be significant spikes at the seasonal lags in ACF and PACF, which can be addressed by selecting proper seasonal autoregressive and moving average orders. Therefore, we can define orders of integration, AR, and MA for the seasonal part of the time series in the same way we define these orders for the non-seasonal part.

Based on the definitions in Brockwell and Davis (2002), \(X_t\) is a *seasonal autoregressive integrated moving average*, SARIMA(\(p,d,q\))(\(P,D,Q\))\(_s\) process if the differenced series \(Y_t=(1 - B)^d (1 - B^s)^D X_t\) is a causal ARMA process. Here \(d\) and \(D\) are nonnegative integers, and \(s\) is the period.

An equation for SARIMA(\(p, d, q\))(\(P, D, Q\))\(_s\) is \[ \begin{split} (1 - B)^d (1 - \phi_1B - \dots - \phi_pB^p) (1 - B^s)^D (1 - \Phi_1B^s - \dots - \Phi_PB^{sP}) X_t \\ = (1 + \theta_1B + \dots + \theta_qB^q) (1 + \Theta_1B^s + \dots + \Theta_QB^{sQ})Z_t, \end{split} \tag{5.2}\] where \(D\) is the order of seasonal differences, \(\Phi_1, \dots, \Phi_P\) are the seasonal autoregression coefficients, \(P\) is the seasonal autoregressive order, \(\Theta_1, \dots, \Theta_q\) are seasonal moving average coefficients, \(Q\) is the seasonal moving average order, and the rest of the terms are the same as in Equation 5.1.

## 5.4 Conclusion

In this lecture, we discovered ARIMA as an extension of ARMA modeling to nonstationary data and SARIMA as an extension of ARIMA models to time series with periodicity (seasonality).

We learned Box–Jenkins iterative procedure for identifying such models, including the orders of differences, \(d\) and \(D\), and \(p\), \(q\), \(P\), and \(Q\).

Please remember to specify the criterion for selecting a model beforehand and consider such options as cross-validation and using a testing set.

## 5.5 Appendix

**Equivalences**

Mathematically, some models are equivalent one to another. Below are some examples.

- Forecasts of ARIMA(0,1,1) are equivalent to simple exponential smoothing.

An ARIMA(0,1,1) model can be written as \[ Y_t = Y_{t-1} + \theta_1 Z_{t-1} + Z_t, \] from which the one-step-ahead forecast is \[ \hat{Y}_t = Y_{t-1} + \theta_1 (Y_{t-1} - \hat{Y}_{t-1}). \] If we define \(\theta_1 = \alpha - 1\), then the above equation transforms to \[ \begin{split} \hat{Y}_t &= Y_{t-1} + (\alpha - 1) (Y_{t-1} - \hat{Y}_{t-1})\\ &= Y_{t-1} + (\alpha - 1) Y_{t-1} - (\alpha - 1) \hat{Y}_{t-1}\\ &= \alpha Y_{t-1} + (1 - \alpha) \hat{Y}_{t-1} \end{split} \]

Forecasts of ARIMA(0,2,2) are equivalent to Holt’s method.

Forecasts of SARIMA(0,1,\(s+1\))(0,1,0)\(_s\) are equivalent to Holt–Winters additive method.