Appendix D — Generalized autoregressive moving average (GARMA) models

Let \(Y_t\) be the observed time series and \(\boldsymbol{X_t}\) exogenous regressors. Then, we can model the conditional distribution of \(Y_t\), given \(Y_1,\dots,Y_{t-1}\), \(\boldsymbol{X_1,\dots,X_{t}}\) as \[ g(\mu_t)=\boldsymbol{X}'_t\beta+\sum_{j=1}^p\phi_j\{g(Y_{t-j})- \boldsymbol{X}'_{t-j}\beta\} +\sum_{j=1}^q\theta_j \{g(Y_{t-j})-g(\mu_{t-j})\}, \tag{D.1}\] where \(g(\cdot)\) is an appropriate link function; \(\mu_t\) is a conditional mean of the dependent variable; \(\boldsymbol{\beta}\) is a vector of regression coefficients; \(\phi_j\), \(j=1,\dots, p\), are the autoregressive coefficients; \(\theta_j\), \(j=1,\dots,q\), are the moving average coefficients, while \(p\) and \(q\) are the autoregressive and moving average orders, respectively.

In certain cases, the function \(g(\cdot)\) requires some transformation of the original series \(Y_{t-j}\) to avoid the non-existence of \(g(Y_{t-j})\) (Benjamin et al. 2003).

The generalized autoregressive moving average model (Equation D.1), GARMA(\(p,q\)), represents a flexible observation-driven modification of the classical Box–Jenkins methodology and GLMs for integer-valued time series. GARMA further advances the classical Gaussian ARMA model to a case where the distribution of the dependent variable is not only non-Gaussian but can be discrete. The dependent variable is assumed to belong to a conditional exponential family distribution given the past information of the process and thus the GARMA can be used to model a variety of discrete distributions (Benjamin et al. 2003). The GARMA model also extends the work of Zeger and Qaqish (1988), who proposed an autoregressive exponential family model, and Li (1994), who introduced its moving average counterpart.

Example: Insurance claims GARMA model

Here, we revisit the insurance claims example from Section 9.5.

Code

logconstant <- 1
Insurance <- read.csv("data/insurance_weekly.csv") %>%
    dplyr::select(Claims, Precipitation) %>% 
    mutate(Precipitation_lag1 = dplyr::lag(Precipitation, 1),
           Week = 1:nrow(.),
           Year = rep(2002:2011, each = 52),
           Claims_ln = log(Claims + logconstant))
summary(Insurance)

#>      Claims       Precipitation    Precipitation_lag1      Week    
#>  Min.   :  0.00   Min.   : 0.000   Min.   : 0.00      Min.   :  1  
#>  1st Qu.:  1.00   1st Qu.: 0.775   1st Qu.: 0.75      1st Qu.:131  
#>  Median :  3.00   Median : 3.800   Median : 3.80      Median :260  
#>  Mean   :  3.61   Mean   : 7.713   Mean   : 7.72      Mean   :260  
#>  3rd Qu.:  4.00   3rd Qu.:10.000   3rd Qu.:10.00      3rd Qu.:390  
#>  Max.   :170.00   Max.   :77.300   Max.   :77.30      Max.   :520  
#>                                    NA's   :1                       
#>       Year        Claims_ln    
#>  Min.   :2002   Min.   :0.000  
#>  1st Qu.:2004   1st Qu.:0.693  
#>  Median :2006   Median :1.386  
#>  Mean   :2006   Mean   :1.252  
#>  3rd Qu.:2009   3rd Qu.:1.609  
#>  Max.   :2011   Max.   :5.142  
#>

Fit a GARMA model relating the weekly number of insurance claims to the total precipitation during that and previous weeks. For fitting the model, you will need the gamlss.util package, which is archived on CRAN. Follow the link to download the latest version, then install it from the archive.

# The model function doesn't accept NAs, so remove them
Insurance_noNA <- na.omit(Insurance)

library(gamlss.util)
m00zip <- garmaFit(Claims ~ Precipitation + Week + Precipitation_lag1
                   ,family = ZIP
                   ,data = Insurance_noNA)

#> deviance of linear model=  3014

m00nbi <- garmaFit(Claims ~ Precipitation + Week + Precipitation_lag1
                   ,family = NBI
                   ,data = Insurance_noNA)

#> deviance of linear model=  2324

AIC(m00zip, m00nbi)

#>        df  AIC
#> m00nbi  5 2334
#> m00zip  5 3024

Based on the smallest AIC, select negative binomial distribution (NBI) for the GARMA model. Obtain ACF and PACF plots of the model residuals to select ARMA order (Figure D.1).

Code

p1 <- forecast::ggAcf(m00nbi$residuals) +
    ggtitle("") +
    xlab("Lag (weeks)")
p2 <- forecast::ggAcf(m00nbi$residuals, type = "partial") +
    ggtitle("") +
    xlab("Lag (weeks)")
p1 + p2 +
    plot_annotation(tag_levels = 'A')

Figure D.1: ACF and PACF plots of residuals of the base model GARMA(0,0) based on NBI distribution.

Based on the observed ACF and PACF patterns (Figure D.1), an appropriate model for the temporal dependence could be ARMA(1,0), ARMA(3,0), or ARMA(1,1), among the most parsimonious options. Alternatively, we can select the orders based on AIC:

(res_arma <- forecast::auto.arima(m00nbi$residuals, 
                                  d = 0, D = 0, 
                                  stepwise = FALSE))

#> Series: m00nbi$residuals 
#> ARIMA(4,0,1) with zero mean 
#> 
#> Coefficients:
#>         ar1     ar2    ar3    ar4     ma1
#>       0.870  -0.121  0.097  0.001  -0.708
#> s.e.  0.173   0.065  0.058  0.058   0.168
#> 
#> sigma^2 = 0.915:  log likelihood = -711
#> AIC=1434   AICc=1434   BIC=1460

# Extract the orders, see the value 'arma' in ?stats::arima
p <- res_arma$arma[1]
q <- res_arma$arma[2]

Refit the GARMA model specifying these orders. Then verify that the temporal dependence in residuals was removed (Figure D.2), and assess other assumptions (Figure D.3), including homogeneity and normality of the quantile residuals.

set.seed(12345)
m10nbi <- garmaFit(Claims ~ Precipitation + Week + Precipitation_lag1
                   ,order = c(1, 0)
                   ,family = NBI
                   ,data = Insurance_noNA)

#> deviance of linear model=  2324 
#> deviance of  garma model=  2306

summary(m10nbi)

#> 
#> Family:  c("NBI", "Negative Binomial type I") 
#> Fitting method: "nlminb" 
#> 
#> Call:  garmaFit(formula = Claims ~ Precipitation + Week + Precipitation_lag1,  
#>     order = c(1, 0), data = Insurance_noNA, family = NBI) 
#> 
#> 
#> Coefficient(s):
#>                            Estimate  Std. Error  t value   Pr(>|t|)    
#> beta.(Intercept)        0.491446753 0.104282235  4.71266 2.4450e-06 ***
#> beta.Precipitation      0.024214583 0.003281726  7.37861 1.5987e-13 ***
#> beta.Week               0.001578533 0.000297431  5.30722 1.1131e-07 ***
#> beta.Precipitation_lag1 0.012197659 0.003880431  3.14338  0.0016701 ** 
#> phi                     0.091454371 0.034440316  2.65545  0.0079204 ** 
#> sigma                   0.489354010 0.057665066  8.48614 < 2.22e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>  Degrees of Freedom for the fit: 6 Residual Deg. of Freedom   513 
#> Global Deviance:     2306 
#>             AIC:     2318 
#>             SBC:     2344

Code

p1 <- forecast::ggAcf(m10nbi$residuals) +
    ggtitle("") +
    xlab("Lag (weeks)")
p2 <- forecast::ggAcf(m10nbi$residuals, type = "partial") +
    ggtitle("") +
    xlab("Lag (weeks)")
p1 + p2 +
    plot_annotation(tag_levels = 'A')

Figure D.2: ACF and PACF plots of the GARMA(1,0) model residuals based on NBI distribution.

Code

plot(m10nbi, ts = FALSE)

#> ******************************************************************
#>   Summary of the Randomised Quantile Residuals
#>                            mean   =  0.0569 
#>                        variance   =  0.854 
#>                coef. of skewness  =  0.532 
#>                coef. of kurtosis  =  7.32 
#> Filliben correlation coefficient  =  0.98 
#> ******************************************************************

Figure D.3: Default diagnostics of the GARMA(1,0) model residuals based on NBI distribution.

The residual diagnostics look somewhat adequate. The issue is with the residuals showing high kurtosis (recall that kurtosis of normal distribution equals 3) and some outliers in the right tail. Also, consider Section 9.5 for another analysis of this dataset.