## Tuesday, June 30, 2015

### Tsay Ch2 - Linear Time Series Analysis and Its Applications

#### Ch2: Linear Time Series Analysis and its Applications

We work with log returns $r_t$ as a collection of random variables over time, as a time series ${r_t}$.

Stationarity: The first step in time series analysis should always be checking stationarity.
1) Strictly stationary time series ${r_t}$ if the joint distribution of $(r_{t_1}, ...,r_{t_k})$ is identical to that of $(r_{t_1+t},...,t_{t_k+t})$ for all t and k, i.e. the joint distribution is invariant under time shift.
2) Weak stationarity: if both mean and covariance between $r_t$ and $r_{t-l}$ are time-invariant. Weak stationarity implies partial predictability. In other words, the first two moments of $r_t$ are finite and constant. A strictly stationary series is weakly stationary but not vice-versa unless $r_t$ is normally distributed.

Autocovariance: $\gamma_l=Cov(r_t,r_{t-l})$ with properties $\gamma_0=Var(r_t)$ and $\gamma_{-l}=\gamma_l$

Autocorrelation function (ACF): uncorrelated means $\rho_{x,y}=0$. It implies independence under normality. $\rho_l=\gamma_l/ \gamma_0$ giving $\rho_0=1$ and $\rho_{-l}=\rho_l$. Sample autocorrelation is given by
$$\hat{\rho}_l=\frac{\sum^T_{t=l+1}(r_t-\bar{r})(r_{t-l}-\bar{r})}{\sum^T_{t=1}(r_t-\bar{r})^2}$$
is asymptotically normal with mean zero and variance $1/T$, if $r_t$ is an iid sequence with finite second moment (Brockwell and Davis 1991).

Bartlett's formula: For a weakly stationary time series of the form $r_t=\mu+\sum^q_1\psi_i a_{t-i}$, where $\psi_0=1$, q is a non-negative integer, and ${a_j}$ are Gaussian white noise series, then $\hat{\rho}_l$ is asymptotically normal with mean $\mu$ and variance $(1+2\sum^q_1\rho^2_i)/T$ for $l \ge q$ (Box, Jenkins, Reinsel 1994). Hence, to test the autocorrelation for a given integer $l$, we can construct the statistic.

Ljung and Box (1978) test increases the power for finite samples by constructing a chi-squared distribution statistic with m degrees of freedom:
$$Q(m)=T(T+2)\sum^m_{l=1}\frac{\hat{\rho}^2_l}{T-l}.$$
In practice, the selection of m may affect the performance of the $Q(m)$ statistic. $m\approx ln(T)$ provides better power performance.

Monthly returns of the value-weighted index seems to have stronger serial dependence than individual stock returns. CAPM suggests there is no autocorrelation in financial series. One has to be careful of the autocorrelation induced by the way index returns are determined or stock prices are determined - these are pseudo relationships.

White noise: time series ${r_t}$ is white noise if the sequence is iid with finite mean and variance. if the mean is 0 and variance $\sigma^2$, it is called Gaussian white noise. For a white noise series all the ACFs are zero. e.g. monthly individual stocks returns seem to be white noise but not value-weighted index return series.

Wold decomposition: A time series ${r_t}$ is said to be linear if it can be written as Wold decomposition
$$r_t=\mu+\sum_{i=0}^{\infty}\psi_i a_{t-i},$$
where $\mu$ is the mean of $r_t$, $\phi_0=1$, and ${a_t}$ is white noise series. If $r_t$ is weakly stationary, we can obtain the mean, variance and covariance easily. For weakly stationary series $\rho_l$ converges to zero as $l$ increases.

AR(p) model: Current conditional expectation of returns depend on the last p steps
$$r_t=\phi_o+\phi_1 r_{t-1}+...+\phi_pr_{r-p}+a_t$$
The necessary and sufficient condition for weakly stationarity is $|\phi_1|<1$ for AR(1) process.  The plot of ACF decays exponentially (extends beyond lag 1!). For higher order AR processes the inverse of the solutions to characteristic equation are called characteristic roots. Complex roots correspond to damped oscillations related to business cycles. The stationarity condition is that the characteristic roots are less than one in modulus (or all solutions to characteristic equation greater than one in modulus).

Identification of AR model: order determination of AR models. Two approaches:
1) Partial Autocorrelation Function (PACF): nested AR models with increasing order. For a true order of $p$, the terms $\hat{\phi}_{l,l}$ converge to zero for all $l>p$ with variance $1/T$. These can be plotted to determine the cutoff as a PACF plot.
2) Information criteria:  likelihood based criteria. e.g. the Akaike information criterion (AIC) 1973 is:
$$AIC = -\frac{2}{T}ln(likelihood)+\frac{2}{T}P$$
For a Gaussian AR($l$) model, AIC reduces to $ln(\hat{\sigma}_l^2)+2l/T$, where $\hat{\sigma^2_l}$ is the maximum likelihood estimate of $\sigma^2_a$, T is the sample size and P is the number of parameters. Another common criteria is Bayesian information criterion (BIC), which for Gaussian AR($l$) reduces to $ln(\hat{\sigma}_l^2)+ln(T)l/T$. BIC tends to select a lower AR model when the sample size is moderate to large.

Parameter estimation: OLS method, or likelihood method. Likelihood estimation is very similar to OLS but gives slightly lower $\sigma_a^2$ for finite samples.

Model checking: residual series should be white noise - check ACF of residual and Ljung-Box statistics with $m-g$ degrees of chi-squared distribution, where m is approximately $ln(T)$ and g is number of AR coefficients used in the model.

Goodness of fit: For a stationary series $R^2$ can be used as a goodness of fit. For unit-root non-stationary series $R^2$ converges to one regardless of the underlying model. Adjusted $R^2$ should be used to penalize the increased number of parameters.

Forecasting: Longer terms forecasts are more accurate and approach unconditional mean. The speed of mean reversion is measured by half life denoted by $k=ln(0.5/|\phi_1|)$.

MA(q) model: Moving average models. These are simple extension of white noise series, an infinite-order AR model with some parameter constraints. Bid-ask bounce in stock trading may introduce an MA(1) structure in a return series. The general form is
$$r_t=c_0+a_t-\theta_1 a_{t-1}-...-\theta_q a_{t-q}$$
It is a finite memory model with ACF zero beyond the first q. MA(q) is invertible if $|\theta| <1$, otherwise noninvertible. An MA model is always stationary. The coefficient of MA models are also called impulse response function.

Identification of AR model: Identified using ACF, provides information on the nonzero MA lags of the model.

Estimation: Likelihood methods:
1) conditional likelihood method: assumes initial shock to be zero.
2) exact likelihood method: initial shocks are estimated jointly with other parameters.

Forecasting: q step ahead forecast for MA(q) model is the mean and stays there.

ARMA(p,q) model: to reduce higher order parameters in AR or MA model, one can use ARMA model. Generally not used for returns but volatility models. ARMA(1,1) is
$$r_t-\phi_1 r_{t-1}=\phi_0+a_t-\theta_1 a_{t-1}$$
For the model to be meaningful we need $\phi_1 \ne \theta_1$. The ACF of an ARMA(1,1) behaves very much like that of an AR(1) model except that the exponential decay starts with lag 2 instead of lag 1. Consequently, the ACF of an ARMA(1,1,) does not cut off at any finite lag. The PACF of an ARMA(1,1) does not cut off at any finite lag either, like of MA(1), except that the exponential decay starts with lag 2. The stationarity condition for an ARMA(1,1) is same as that of an AR(1) model. A general ARMA(p,q) model is
$$r_t = \phi_0+\sum_{i=1}^{p}\phi_i r_{t-i}+a_t-\sum_{j=1}^{q}\theta_j a_{t-j}$$
There should be no common factors between the AR and MA polynomial, otherwise the order of the model can be reduced. For stationarity all solutions to AR polynomial should be less than 1 in absolute value.

Identification: EACF (Tsay, Tiao 1984) can be used. The first zero corner in the EACF table identifies the order. Estimation can then be done using conditional or exact likelihood. Ljung-Box statistics of the residuals can also be checked for the adequacy of the model.

Unit root nonstationarity: Price series, interest rate, FX rates are generally non stationary. They are called unit root non-stationary  series in time series literature. The best example is random-walk.

Random walk: A time series ${p_t}$ is a random walk if it satisfies
$$p_t=p_{t-1}+a_t$$
Looking it as and AR(1) process, it is a non-stationary as the coefficient is equal to 1. We call this a unit root nonstationary time series. This is a non-mean reverting model with prediction of price equal to the value at forecast origin, with variance of estimate $l\sigma_a^2$, where $l$ is the number of look ahead steps, which diverge to infinity as $l \to \infty$. The series has a strong memory, as the sample ACFs are all approaching 1 as the sample size increases.

Random walk with drift: The market index tends to have a small and positive mean. The model for log price becomes
$$p_t=\mu+p_{t-1}+a_t$$
Both the expected value $t\mu$ and variance $t\sigma_a^2$ increase with time t.

Trend-Stationary time series: $p_t = \beta_0+\beta_1 t +t_t$
The mean is $\beta_0+\beta_1 t$ and variance is finite and equal to $Var(r_t)$, unlike random walk non-stationary model with trend. This can be transformed into a stationary one by removing the time trend via a simple linear regression analysis.

ARIMA models: A conventional way to handle unit-root nonstationarity is to use differencing. A time series $y_t$ is said to be ARIMA(p,1,q) process if the change series $c_T=y_t-y_{t-1}$ is stationary and invertible ARMA(p,q) process. For example, log price series is an ARIMA process. Series with multiple roots may need multiple differencing.

Dickey-Fuller unit root test: $H_0:\phi_1=1$ versus $H_a:\phi_1<1$. The test statistic is the t-ratio of the least squares estimate of $\phi_1$ from the model $p_t=\phi_1 p_{t-1}+e_t$ giving
$$\hat{\phi}_1 = \frac{\sum_1^T p_{t-1}p_t}{\sum_1^T p^2_{t-1}}$$
$$\hat{\sigma}_e^2=\frac{\sum_1^T (p_t-\hat{\phi}_1 p_{t-1})^2}{T-1}$$
where $p_0=0$ and $T$ is the sample size. The t-ratio is
$$\frac{\hat{\phi}_1-1}{std(\hat{\phi}_1)} = \frac{\sum_1^T p_{t-1}e_t}{\hat{\sigma}_e\sqrt{\sum_1^T p^2_{t-1}}}$$

Left out sections: 2.8 Seasonal models, 2.9 Regression models with time series errors, 2.10 consistent covariance matrix estimation, 2.11 long-memory models