8  Time Series Regression with Trends

The goal of this lecture is to introduce methods of handling nonstationary time series in regression models. You will become familiar with the problem of spurious correlation (regression) and approaches helping to avoid it.

Objectives

  1. Learn three alternative ways of handling trends and/or seasonality to avoid spurious results: incorporate time effects into a regression model, use deviations from trends, or differenced series.
  2. Introduce the concept of cointegration, learn how to detect it, and model using an error correction model.

Reading materials

8.1 Spurious correlation

Results of statistical analysis (correlation, regression, etc.) are called spurious when they are likely driven not by the underlying mechanisms such as the physical relationships between variables, but by matching patterns (often, temporal patterns) of the variables leading to the statistical significance of tested relationships. Such matching patterns include trends, periodic fluctuations (seasonality), and more random patterns like spikes matching in several time series, for example, detected by searching over a large database of time series (a.k.a. cherry picking).

Code
set.seed(123)
T <- 300
Xt <- ts(rnorm(T))
Yt <- ts(rnorm(T))
p1 <- forecast::autoplot(Xt)
p2 <- forecast::autoplot(Yt)
p1 + p2 +
    plot_annotation(tag_levels = 'A')

Figure 8.1: Original stationary and independent time series.

Figure 8.1 shows two series of length \(T = 300\) simulated from the standard normal distribution, \(N(0,1)\). These are independent and identically distributed (i.i.d.) random variables: each of the \(T\) values in \(X_t\) and \(Y_t\) was drawn independently from other values from the same distribution. Two random variables are independent if the realization of one does not affect the probability distribution of the other. Independence is a strong condition, it also implies (includes) that the values are not correlated. This is true both for the values within the series \(X_t\) and \(Y_t\), and across \(X_t\) and \(Y_t\) (i.e., \(X_t\) and \(Y_t\) are not autocorrelated, nor correlated with each other). This is the asymptotic property of such a time series (as the sample size increases infinitely).

In finite samples, we may observe that a point estimate of the correlation coefficient even for such an ideal series is not exactly zero, but it will be usually not statistically significant. See the results (confidence interval and \(p\)-value) from the correlation \(t\)-test below:

cor.test(Xt, Yt)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  Xt and Yt
#> t = -1, df = 298, p-value = 0.3
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.1727  0.0529
#> sample estimates:
#>     cor 
#> -0.0607

Not many time series behave like that in real life – often we observe some trends. Let’s add linear trends to our simulated data, for example, trends going in the same direction but with slopes of different magnitudes. Here we use linear increasing trends, i.e., with positive slopes Figure 8.2.

Note

It is probably a good idea to refrain from writing ‘positive trends’ (or ‘negative trends’) because they can be confused with ‘good’ or ‘beneficial’ trends. For example, a decrease in the unemployment rate is a positive (good) trend for a country, but it is a trend with a negative slope. Contrary, a linear trend in pollutant concentrations with a positive slope (going upward) shows a negative (worsening) tendency for the ecosystem.

Code
Xt <- Xt + c(1:T)/95
Yt <- Yt + c(1:T)/50
p1 <- forecast::autoplot(Xt) +
    ylim(-2, 8)
p2 <- forecast::autoplot(Yt) +
    ylim(-2, 8)
p1 + p2 +
    plot_annotation(tag_levels = 'A')

Figure 8.2: Time series with trends.

After adding trends to each of the time series, the correlation of \(X_t\) and \(Y_t\) is strong and statistically significant, but this is not necessarily because these series are so strongly related to each other.

cor.test(Xt, Yt)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  Xt and Yt
#> t = 13, df = 298, p-value <2e-16
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.519 0.665
#> sample estimates:
#>   cor 
#> 0.597

Some other factors may be driving the dynamics (trends) of \(X_t\) and \(Y_t\) in the same direction. Also, recall the general formula for computing correlation (or autocorrelation) of time series and that we need to subtract time-dependent means (not just a mean calculated over the whole period assuming the mean is not changing or time-independent), for example, as in Equation 2.5.

For example, a city’s growing population may result in more police officers and heavier pollution at the same time, partially because more people will drive their vehicles in the city. An attempt to correlate (regress) pollution with the number of police officers will produce a so-called spurious correlation (regression). The results of statistical tests will be likely statistically significant. However, is pollution directly related to the number of police officers? Will the dismissal of a police officer help to make the air cleaner?

Not only common increasing/decreasing trends but also other systematic changes (such as seasonality) may be responsible for spurious correlation effects. For example, both high ice cream sales and harmful algal blooms typically occur in warm weather conditions and may be ‘significantly’ correlated, suggesting banning ice cream for the sake of a safer environment. See more interesting examples of spurious correlation at http://www.tylervigen.com/spurious-correlations.

Sometimes, some simple tricks may help to avoid spurious results. For example, analyze not the raw numbers, but the rates that remove the effect of population growth in a city: crime rate per 100,000 inhabitants, number of people older than 70 per 1,000 inhabitants, etc. For more general approaches, see the next section.

8.3 Cointegration

Generally, cointegration might be characterized by two or more I(1) variables indicating a common long-run development, i.e., the variables do not drift away from each other except for transitory fluctuations. This defines a statistical equilibrium that, in empirical analysis, can often be interpreted as a long-run [economic] relation (Engle and Granger 1987).

In other words, two I(1) series \(X_t\) and \(Y_t\) are cointegrated if their linear combination \(u_t\) is I(0): \[ Y_t - \beta X_t = u_t. \tag{8.3}\]

Cointegration means a common stochastic trend (see Appendix C on testing for a common parametric trend). The vector \((1, -\beta)^{\top}\) is called the cointegration vector.

For the development of methods of analyzing time series cointegration, in 2003, Clive W. J. Granger was awarded 1/2 of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (the other half was awarded to R. Engle, see Chapter 6).

8.3.1 Two-step Engle–Granger method

  1. Estimate long-run relationship, i.e., regression in levels as in Equation 8.3, and test residuals for I(0).
  2. If the residual series \(u_t\) is I(0), use it in error correction model (ECM) regression \[ \begin{split} \Delta Y_t &= a_0 -\gamma_Y(Y_{t-1}-\beta X_{t-1})+\sum_{j=1}^{n_X}a_{Xj}\Delta X_{t-j}+\sum_{j=1}^{n_Y}a_{Yj}\Delta Y_{t-j} + u_{Y,t},\\ \Delta X_t &= b_0 +\gamma_X(Y_{t-1}-\beta X_{t-1})+\sum_{j=1}^{k_X}b_{Xj}\Delta X_{t-j}+\sum_{j=1}^{k_Y}b_{Yj}\Delta Y_{t-j} + u_{X,t}, \end{split} \tag{8.4}\] where \(u_X\) and \(u_Y\) are pure random processes. If \(X_t\) and \(Y_t\) are cointegrated, at least one \(\gamma_i\), \(i = X, Y\), has to be different from zero.

OLS estimator is super consistent, convergence \(T\). However, OLS can be biased in small samples.

The representation in Equation 8.4 has the advantage that it only contains stationary variables, although the underlying relation is between nonstationary (I(1)) variables. Thus, if the variables are cointegrated and the cointegration vector is known, the traditional statistical procedures can be applied for estimation and testing.

Example: Error correction model for simulated data

Demonstrate the analysis using simulated time series (Figure 8.9).

Code
p1 <- ggplot2::autoplot(Xt)
p2 <- ggplot2::autoplot(Yt)
p1 + p2 +
    plot_annotation(tag_levels = 'A')

Figure 8.9: Simulated I(1) time series.

Apply unit-root test to check the integration order of each series, using the R package tseries:

tseries::adf.test(Xt)
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  Xt
#> Dickey-Fuller = -2, Lag order = 6, p-value = 0.5
#> alternative hypothesis: stationary
tseries::adf.test(diff(Xt))
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  diff(Xt)
#> Dickey-Fuller = -7, Lag order = 6, p-value = 0.01
#> alternative hypothesis: stationary
tseries::adf.test(Yt)
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  Yt
#> Dickey-Fuller = -2, Lag order = 6, p-value = 0.5
#> alternative hypothesis: stationary
tseries::adf.test(diff(Yt))
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  diff(Yt)
#> Dickey-Fuller = -8, Lag order = 6, p-value = 0.01
#> alternative hypothesis: stationary

With the confidence of 95%, the ADF test results show that each of the time series, \(X_t\) and \(Y_t\), are I(1). (However, we have used the test 4 times, without controlling the overall Type I error.)

Fit the linear regression \[ Y_t = a + bX_t + u_t. \tag{8.5}\] The vector \([1, -b]\) is the cointegration vector.

Ut <- lm(Yt ~ Xt)$residuals
tseries::adf.test(Ut)
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  Ut
#> Dickey-Fuller = -5, Lag order = 6, p-value = 0.01
#> alternative hypothesis: stationary

While each of the time series \(X_t\) and \(Y_t\) is I(1), the resulting residual series \(u_t \sim \mathrm{I}(0)\), thus we conclude, \(X_t\) and \(Y_t\) are cointegrated.

Apply a simple error correction model (with \(n_X = n_Y = 1\)), using the R package dynlm or just specify lags using the package dplyr:

# Error correction term
ect <- Ut[-length(Ut)]

# Differenced series
dy <- diff(Yt)
dx <- diff(Xt)

Model using dynlm::dynlm():

library(dynlm)
ecmdat1 <- cbind(dy, dx, ect)
ecm1 <- dynlm(dy ~ L(ect, 1) + L(dy, 1) + L(dx, 1), data = ecmdat1)
summary(ecm1)
#> 
#> Time series regression with "ts" data:
#> Start = 3, End = 250
#> 
#> Call:
#> dynlm(formula = dy ~ L(ect, 1) + L(dy, 1) + L(dx, 1), data = ecmdat1)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.8348 -0.3860 -0.0224  0.3696  1.5586 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.00919    0.03588    0.26   0.7980    
#> L(ect, 1)   -0.67417    0.07482   -9.01   <2e-16 ***
#> L(dy, 1)    -0.49772    0.07341   -6.78    9e-11 ***
#> L(dx, 1)     0.22842    0.07906    2.89   0.0042 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.565 on 244 degrees of freedom
#> Multiple R-squared:  0.265,  Adjusted R-squared:  0.256 
#> F-statistic: 29.3 on 3 and 244 DF,  p-value: 3.2e-16

Model using lm() and dplyr::lag():

ecm2 <- lm(dy ~ dplyr::lag(ect, 1) + 
               dplyr::lag(as.vector(dy), 1) + 
               dplyr::lag(as.vector(dx), 1))
summary(ecm2)
#> 
#> Call:
#> lm(formula = dy ~ dplyr::lag(ect, 1) + dplyr::lag(as.vector(dy), 
#>     1) + dplyr::lag(as.vector(dx), 1))
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.8348 -0.3860 -0.0224  0.3696  1.5586 
#> 
#> Coefficients:
#>                              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                   0.00919    0.03588    0.26   0.7980    
#> dplyr::lag(ect, 1)           -0.67417    0.07482   -9.01   <2e-16 ***
#> dplyr::lag(as.vector(dy), 1) -0.49772    0.07341   -6.78    9e-11 ***
#> dplyr::lag(as.vector(dx), 1)  0.22842    0.07906    2.89   0.0042 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.565 on 244 degrees of freedom
#>   (1 observation deleted due to missingness)
#> Multiple R-squared:  0.265,  Adjusted R-squared:  0.256 
#> F-statistic: 29.3 on 3 and 244 DF,  p-value: 3.2e-16

There is also the R package ecm, but it uses a modified formulation of the model, see details for the function ecm::ecm().

In the example above, the time series were simulated as cointegrated. Below is an example of another I(1) process \(W_t\) but with a stochastic trend different from that of \(X_t\). In this case, the linear combination of individually integrated \(W_t\) and \(X_t\) does not produce a stationary time series, thus, \(W_t\) and \(X_t\) are not cointegrated.

U2 <- lm(Wt ~ Xt)$residuals
tseries::adf.test(U2)
#> 
#>  Augmented Dickey-Fuller Test
#> 
#> data:  U2
#> Dickey-Fuller = -2, Lag order = 6, p-value = 0.7
#> alternative hypothesis: stationary

8.3.2 Johansen test

The Johansen test allows for more than one cointegrating relationship. The null hypothesis for the trace test is that the number of cointegration vectors is \(r<k\), vs. the alternative that \(r=k\). The testing proceeds sequentially for \(k=1,2,\dots\); and the first non-rejection of the null hypothesis is taken as an estimate of \(r\).

Using the R package urca:

library(urca)
vecm <- ca.jo(cbind(Yt, Xt, Wt))
summary(vecm)
#> 
#> ###################### 
#> # Johansen-Procedure # 
#> ###################### 
#> 
#> Test type: maximal eigenvalue statistic (lambda max) , with linear trend 
#> 
#> Eigenvalues (lambda):
#> [1] 0.270915 0.025774 0.000838
#> 
#> Values of teststatistic and critical values of test:
#> 
#>           test 10pct  5pct 1pct
#> r <= 2 |  0.21   6.5  8.18 11.6
#> r <= 1 |  6.48  12.9 14.90 19.2
#> r = 0  | 78.36  18.9 21.07 25.8
#> 
#> Eigenvectors, normalised to first column:
#> (These are the cointegration relations)
#> 
#>          Yt.l2  Xt.l2 Wt.l2
#> Yt.l2  1.00000  1.000   1.0
#> Xt.l2 -0.49209 -5.234 -26.2
#> Wt.l2 -0.00497  0.383 -19.3
#> 
#> Weights W:
#> (This is the loading matrix)
#> 
#>       Yt.l2   Xt.l2     Wt.l2
#> Yt.d -0.686 0.00228  2.64e-06
#> Xt.d -0.179 0.00796 -1.54e-05
#> Wt.d  0.180 0.00253  1.59e-04

If two time series are cointegrated, then the usual regression in Equation 8.5 is the so-called long-run equilibrium relation or attractor, i.e., the relationship between \(X_t\) and \(Y_t\) can be explained by Equation 8.5 in a long run. Equation 8.5 is applied for estimation, not for testing (see Figure 6.1 in Kirchgässner and Wolters 2007 on highly dispersed \(t\)-statistic). The error correction model in Equation 8.4 should be estimated for testing (\(p\)-values from the ECM can be used for testing, also see Chapter 6 in Kirchgässner and Wolters 2007).

8.4 Conclusion

Now we can incorporate trend effects into our models, using the three considered approaches or by testing for cointegration and applying an error correction model. The next step would be to incorporate autocorrelation structure in the residuals (the simulated example considered here used independent normally distributed noise, so it was an artificial ideal case of no autocorrelation, whereas we usually encounter autocorrelations, e.g., see residual diagnostics in the examples in Section 8.2.1).