## Tuesday, June 30, 2015

### Tsay Ch8 - Multivariate Time Series Analysis and Its Applications

How markets are interrelated is important to understand the lead-lag relationship and under what circumstances they reverse or do not work. Various previous methods can be applied directly to the vector case, but some need attention.

#### Weak stationarity and cross-correlation matrices

k-dimensional time series $\pmb{r_t}=[r_{1t},...,r_{kt}]^T$ is weakly stationary if its first and second moments are time-invariant, $\pmb{\mu}=E(\pmb{r}_t)$ and $\pmb{\Gamma}_0=E[(\pmb{r}_t-\pmb{\mu})(\pmb{r}_t-\pmb{\mu})^T]$. Let $\pmb{D}$ be a $k\times k$ diagonal matrix consisting of the standard deviations of $r_{it}$. The lag-zero, cross-correlation matrix is defined as $\pmb{\rho}_0=\pmb{D}^{-1}\pmb{\Gamma}_0\pmb{D}^{-1}$, which is the correlation matrix. The lag-$l$ cross-covariance matrix $\pmb{r}_t$ is defined as $\pmb{\Gamma}_l=E[(\pmb{r}_t-\pmb{\mu})(\pmb{r}_{t-l}-\pmb{\mu})^T]$. For a weakly stationary series, the cross-covariance matrix $\pmb{\Gamma}_l$ is a function of $l$, not the time index $t$. The lag-$l$ cross-correlation matrix (CCM) is defined as $\pmb{\rho}_l=\pmb{D}^{-1}\pmb{\Gamma}_l\pmb{D}^{-1}$. To understand this better, notice
$$\rho_{ij}(l)=\frac{\Gamma_{ij}(l)}{\sqrt{\Gamma_{ii}(0)\Gamma_{jj}(0)}}=\frac{Cov(r_{it},r_{j,t-l})}{std(r_{it})std(r_{jt})}$$
which is the correlation between $r_{it}$ and $r_{j,t-l}$. If $\rho_{ij}(l)\ne 0$ and $l>0$, we say that the series $r_{jt}$ leads the series $r_{it}$ at lag $l$. Similarly, if $\rho_{ji}(l)\ne 0$ and $l>0$ we say that the series $r_{it}$ leads the series $r_{jt}$ at lag $l$. The diagonal element $\rho_{ii}(l)$ is simply the lag-$l$ autocorrelation coefficient of $r_{it}$.

Some remarks are ($l>0$):
1) In general $\rho_{ij}(l) \ne \rho_{ji}(l)$ for $i\ne j$, because they are measuring two different lag relationships, implying $\pmb{\Gamma}_l$ and $\pmb{\rho}_l$ are in general not symmetric.
2) It is easy to see that $\pmb{\Gamma}_l=\pmb{\Gamma}_{-l}^T$ and $\pmb{\rho}_l=\pmb{\rho}_{-l}^T$. Hence it suffices in practice to consider the cross-correlation matrices $\pmb{\rho}_l$ for $l \ge 0$.

For the cross-correlation matrices {$\pmb{\rho}_l|l=0,1,...$}, the diagonal elements {$\rho_{ii}(l)|l=0,1,...$} are the autocorrelation function of $r_{it}$, the off-diagonal element $\rho_{ij}(0)$ measures the concurrent linear relationship between $r_{it}$ and $r_{jt}$, and for $l>0$ the off-diagonal element $\rho_{ij}(l)$ measures the linear dependence of $r_{it}$ on past value of $r_{j,t-l}$. Depending on the value in these matrices one and identify -
1) no linear relationship ($\rho_{ij}(l)=\rho_{ji}(l)=0, \forall l\ge0$),
2) concurrent correlation ($\rho_{ij}(0)\ne0$),
3) no lead-lag relationship ($\rho_{ij}(l)=\rho_{ji}(l)=0, \forall l>0$),
4) unidirectional relationship ($\rho_{ij}(l)=0,\forall l>0,\rho_{ji}(v)$ for some $v\ge0$), or
5) feedback relationship ($\rho_{ij}(l)\ne0$, for some $l>0,\rho_{ji}(v)\ne0$ for some $v\ge0$) .

Sample Cross-correlation matrices can be estimate using $\hat{\pmb{\rho}}_l=\hat{\pmb{D}}^{-1}\hat{\pmb{\Gamma}}_l\hat{\pmb{D}}^{-1}$, where
$$\hat{\Gamma}_l=\frac{1}{T}\sum_{t=l+1}^{T}(\pmb{r}_t-\bar{\pmb{r}})(\pmb{r}_{t-l}-\bar{\pmb{r}})^T, \qquad l\ge 0.$$ Bootstrapping can be used to get confidence intervals on finite samples.

Multivariate  Portmanteau tests: or Multivariate Ljung-Box test with statistic $Q(m)$ have the null hypothesis $H_0:\pmb{\rho}_1=...=\pmb{\rho}_m=\pmb{0}$, and $H_a: \pmb{\rho}_i\ne0$ for some $i \epsilon {1,...,m}$. The test statistic assumes the form
$$Q_k(m)=T^2\sum_{l=1}^m\frac{1}{T-l} tr(\hat{\Gamma}^T_l\hat{\pmb{\Gamma}}^{-1}_0\hat{\pmb{\Gamma}}_l \hat{\pmb{\Gamma}}^{-1}_0)$$ and under some regularity conditions follows a chi-squared distribution with $k^2m$ degrees of freedom, asymptotically.

#### Vector autoregressive models (VAR)

A multivariate time series ${\pmb{r}_t}$ is a VAR process of order 1, or VAR(1) for short, if it follows the model $$\pmb{r}_t=\pmb{\phi}_0+\pmb{\Phi}\pmb{r}_{t-1}+\pmb{a}_t,$$ where $\pmb{\phi}_0$ is a k-dimensional vector, $\pmb{\Phi}$ is a $k\times k$ matrix, and ${\pmb{a}_t}$ is a sequence of uncorrelated random vectors with mean zero and covariance matrix $\pmb{\Sigma}$, which is positive definite (generally assumed to be multivariate normal).

Positive definite matrix is a symmetric matrix with all eigenvalues positive. Also for any vector $\pmb{b}$, we have $\pmb{b}^T\pmb{Ab}>0$. These types of matrices can be decomposed as $\pmb{A}=\pmb{P\Lambda P}^T$, where $\pmb{\Lambda}$ is a diagonal matrix consisting of eigenvalues of $\pmb{A}$ and $\pmb{P}$ is a square matrix consisting of eigenvectors of $\pmb{A}$. These eigenvectors are orthogonal to each other. Matrix $\pmb{P}$ is orthogonal and this decomposition is referred to as spectral decomposition. For a symmetric matrix $\pmb{A}$, there exists a lower triangular matrix $\pmb{L}$ with diagonal elements being 1 and a diagonal matrix $\pmb{G}$ such that $\pmb{A}=\pmb{LGL}^T$. If $\pmb{A}$ is positive definite, then the diagonal elements of $\pmb{G}$ are positive. In this case, we have $\pmb{A=L\sqrt{G}\sqrt{G}L}^T=\pmb{MM}^T$, where $\pmb{M=L\sqrt{G}}$ is a lower triangular matrix. This is called Cholesky decomposition. Notice that this implies $\pmb{L}^{-1}\pmb{A}(\pmb{L}^{-1})^{T}=\pmb{G}$.

Reduced and Structural form: In general the off diagonal elements of matrix $\pmb{\Sigma}$ show the concurrent relationship between $r_{1t}$ and $r_{2t}$, while the matrix $\pmb{\Phi}$ measures the dynamic dependence or $\pmb{r}_t$. This is called reduced-form model because it does not show explicitly the concurrent dependence between the component series. The explicit-form expression of concurrent relationship (for the last series and hence any series by rearrangement) can be deduced by a simple linear transformation. Using Cholesky decomposition (possible because $\pmb{\Sigma}$ is positive definite symmetric matrix) we can find a lower triangle matrix $\pmb{L}$ with unit diagonal elements such that $\pmb{\Sigma=LGL}^T$, where $\pmb{G}$ is a diagonal matrix. If we define $\pmb{b}_t = \pmb{L}^{-1}\pmb{a}_t$, then $E(\pmb{b}_t)=\pmb{L}^{-1}E(\pmb{a_t})=\pmb{0}$ and $Cov(\pmb{b}_t)=E(\pmb{b}_t\pmb{b}_t^T)=\pmb{L}^{-1}\pmb{\Sigma}(\pmb{L}^T)^{-1}=\pmb{G}$. Since $\pmb{G}$ is a diagonal matrix the components of $\pmb{b}_t$ are uncorrelated.

Pre-multiplying the reduced-form with $\pmb{L}^{-1}$, to uncouple the equations, we get
$$\pmb{L}^{-1}\pmb{r}_t=\pmb{L}^{-1}\pmb{\phi}_0+\pmb{L}^{-1}\pmb{\Phi}\pmb{r}_{t-1}+\pmb{L}^{-1}\pmb{a}_t=\pmb{\phi}_0^*+\pmb{\Phi}^*\pmb{r}_{t-1}+\pmb{b}_t.$$
The last row of $\pmb{L}^{-1}$ has 1 as the last element, let it be $(w_{k1},w_{k2},..,w_{k,k-1},1)$, and hence the structural equation for the last ($k^{th}$) time series becomes:
$$r_{kt}+\sum_{i=1}^{k-1}w_{ki}r_{it}=\phi_{k,0}^*+\sum_{i=1}^k\Phi_{ki}^* r_{i,t-1}+b_{kt}.$$
This is possible because $\pmb{b}_t$ is a diagonal matrix and uncoupled. Reduced-form is commonly used for two reasons - ease in estimation, and concurrent correlations cannot be used in forecasting.

Stationarity condition and moments of a VAR(1) model: All eigenvalues of $\pmb{\Phi}$ should be less than 1 in modulus for weak stationarity for $\pmb{r}_t$, provided the covariance matrix of $\pmb{a}_t$ exists. Further we have $\pmb{\Gamma}_l=\pmb{\Phi \Gamma}_{l-1}$, for $l>0$, where $\pmb{\Gamma}_j$ is the lag-j cross-covariance matrix of $\pmb{r}_t$. By repeated substitutions we get $\pmb{\Gamma}_l=\pmb{\Phi}^l \pmb{\Gamma}_0$. Further for $\pmb{\Upsilon}=\pmb{D}^{-1/2}\pmb{\Phi}\pmb{D}^{1/2}$, we get $\pmb{\rho}_l=\pmb{\Upsilon}^l\pmb{\rho}_0$. A VAR(p) model is generally converted to a VAR(1) model using companion matrix and then analyzed like a VAR(1) model.

To find the order of a VAR model one can generally use the multi-variant equivalent of PACF with hypothesis tests on the successive residuals. The $i^{th}$ equation in the PACF is given by $\pmb{r}_t=\pmb{\phi}_0+\pmb{\Phi}_1\pmb{r}_{t-1}+...+\pmb{\Phi}_i\pmb{r}_{t-i}+\pmb{a}_t$. Parameters of these equations can be estimated by OLS method. For the $i^{th}$ equation let the OLS estimates of coefficients be $\hat{\pmb{\Phi}}_j^{(i)}$ and $\hat{\pmb{\phi}}_0^{(i)}$, where the superscript $(i)$ is used to denote the VAR(i) model. Then the residual is $\hat{\pmb{a}}_t^{(i)}=\pmb{r}_t-\hat{\pmb{\phi}}_0^{(i)}-\hat{\pmb{\Phi}}_1^{(i)}\pmb{r}_{t-1}-...-\hat{\pmb{\Phi}}_i^{(i)}\pmb{r}_{t-i}$. We then test the hypothesis sequentially to identify the order of VAR model. For $i^{th}$ and $(i-1)^{th}$ equations we test $H_0:\pmb{\Phi}_i=0$ versus $H_a:\pmb{\Phi}_i\ne0$, the test statistic is
$$M(i)=-\Bigg(T-k-i-\frac{3}{2}\Bigg)ln\Bigg(\frac{|\hat{\Sigma}_i|}{|\hat{\Sigma}_{i-1}|}\Bigg).$$ Asymptotically, $M(i)$ is distributed as a chi-squared distribution with $k^2$ degrees of freedom, where $k$ is the dimensionality of the asset universe.

Equivalent AIC and BIC methods can also be employed.  OLS or ML is generally used to estimate the parameters, the two methods being asymptotically equivalent. Once a model is fit, residuals should be tested for inadequacy using $Q_k(m)$ statistic (with $k^2m-g$ degrees of freedom). Forecasting is similar to the uni-variate case. Impulse response function is the MA version and can be derived to look at the decay rate. The MA equation is premultiplied by $\pmb{L}^{-1}$ to get the impulse response function of $\pmb{r}_t$ with the orthogonal innovations $\pmb{b}_t$. But different ordering may lead to different response functions, which is a drawback.

Vector moving-average models (VMA) :A VMA(1) model is given by $$\pmb{r}_t=\pmb{\theta}_0+\pmb{a}_t-\pmb{\Theta}\pmb{a}_{t-1},$$ where $\pmb{\theta}_0$ is k-dimensional vector, $\pmb{\Theta}$ is $k\times k$ matrix. Like the uni-variate case the cross-correlations cuts at order 1 and can be used to identify the order. Estimation of VMA model is lot more involved. The conditional or exact MLE approaches can be used.
Vector ARMA models (VARMA): In generalizing from uni-variate to vectors ARMA encounters the issue of identifiability - it may not be uniquely defined. Some constraints need to be imposed - structural specification. These are hardly used.
Marginal models of components: Given the vector models the component models are called the marginal models. For a k-dimensional ARMA(p,q)  model, the marginal models are ARMA(kp,(k-1)p+q) models.

Unit root nonstationarity and cointegration: When modeling several unit-root nonstationary time series jointly, one may encounter the case of cointegration. This may be due to the common trend or unit root of one of the components. In other words one can find a linear combination which is stationary. Let $h$ be the number of unit roots (or common trends) in the $k$-dimensional series $\pmb{x}_t$. Cointegration exists if $0<h<k$, and the quantity $k-h$ is called the number of cointegrating factors, these are the different linear combinations that are unit-root stationary. The linear combinations resulting in these unit-root stationary processes are called the cointegrating vectors. Two price series, if cointegrated, will have common underlying trend and we will lose this information if we take the first difference of the price series. This is because one difference per unit-root preserves useful information. Under cointegration we have more non-stationary series than uni-roots hence losing information. Overdifferencing can lead to the problem of unit roots in the MA matrix polynomial, invertability and estimation. Also, cointegration can also exist after adjusting for transaction costs and exchange-rate risk, which is artificial.

Error correction form: To overcome the difficulty of noninvertible VARMA models one can use this form. A VARMA(p,q) model is $$\pmb{x}_t=\sum_{i=1}^{p}\pmb{\Phi}_i\pmb{x}_{t-i}+\pmb{a}_t-\sum_{j=1}^q\pmb{\Theta}_j\pmb{a}_{t-j}.$$
For $\Delta \pmb{x}_t=\pmb{x}_t-\pmb{x}_{t-1}$, we can subtract $\pmb{x}_{t-1}$ from both side of VARMA equation to get the error correction form. $\Delta\pmb{x}_t=\pmb{\alpha\beta}^T\pmb{x}_{t-1}+\sum_{i=1}^{p-1}\pmb{\Phi}_i^* \Delta\pmb{x}_{t-i}+\pmb{a}_t-\sum_{j=1}^q\pmb{\Theta}_j\pmb{a}_{t-j}$
where $\pmb{\alpha}$ and $\pmb{\beta}$ are $k\times m$ full-rank matrix, k is the total asset dimension, m is the cointegrating factors ($m<k$).  The term $\pmb{\alpha\beta}^T\pmb{x}_{t-1}$ is called the error-correction term, as it compensates for the over-differentiating. Further, $\pmb{\beta}^T\pmb{x}_{t-1}$ is stationary. Also, $$\pmb{\Phi}_j^*=-\sum_{i=j+1}^{p}\pmb{\Phi}_i, \qquad j=1,...,p-1$$ $$\pmb{\alpha\beta}^T=\pmb{\Phi}_p+...+\pmb{\Phi}_1-\pmb{I}$$
The time series $\pmb{\beta}^T\pmb{x_t}$ is unit-root stationary, and the columns of $\pmb{\beta}$ are the cointegrating vectors of $\pmb{x}_t$.

Co-integrated VAR models: We focus on VAR models for their simplicity in estimation, to better understand cointegration here. For a k-dimensional VAR(p) model
$\pmb{x}_t=\pmb{\mu}_t+\pmb{\Phi}_1\pmb{x}_{t-1}+...+\pmb{\Phi}_p\pmb{x}_{t-p}+\pmb{a}_t,$
where $\pmb{\mu}_0+\pmb{\mu}_1t$. Or equivalently, using backshift operator $B$
$(\pmb{I}-\pmb{\Phi}_1B-...-\pmb{\Phi}_pB^p)\pmb{x}_t=\pmb{\mu}_t+\pmb{a}_t.$
The characteristic polynomial in the above is represented as $\pmb{\Phi}(B)$. For a unit-root nonstationary process, 1 is a root making $|\pmb{\Phi}(1)|=0$ An error-correction form for this can be obtained by subtracting $\pmb{x}_{t-1}$ from both sides of the equation giving
$\Delta \pmb{x}_t=\pmb{\mu}_t+\pmb{\Pi}\pmb{x}_{t-1}+\pmb{\Phi}_1^*\Delta\pmb{x}_{t-1}+...+\pmb{\Phi}_{p-1}^*\Delta \pmb{x}_{t-p+1}+\pmb{a}_t$
Where $\pmb{\Pi}=\pmb{\Phi}_1+...+\pmb{\Phi}_p-\pmb{I}=-\pmb{\Phi}(1)$ and $\pmb{\Phi}_j^*=-\sum_{i=j+1}^p\pmb{\Phi}_i$ for $j=1,...,p-1$. Further if Rank($\pmb{\Pi}$)=0 implies that $\pmb{x}_t$ is not cointegrated, Rank($\pmb{x}_t$)=$k$ implies that ECM is not informative and one studies $\pmb{x}_t$ directly. Finally, if $0<$Rank($\pmb{\Pi}$)$=m<k$ then one can write $\pmb{\Pi}=\pmb{\alpha\beta}^T$, where $\pmb{\alpha}$ and $\pmb{\beta}$ are both of rank $m$. We have the case of cointegration with $m$ linearly independent cointegrated vectors $\pmb{w}_t=\pmb{\beta}^T\pmb{x}_t$,a nd has $k-m$ unit roots or common trends.

To obtain the $k-m$ common trends vector of size $(k-m)\times 1$, $\pmb{y}_t=\pmb{\alpha}^T_{\bot}\pmb{x}_t$, we calculate the orthogonal matrix of size $k \times (k-m)$ as $\pmb{\alpha}^T_{\bot}\pmb{\alpha}=\pmb{0}$.To uniquely identify $\pmb{\alpha}$ and $\pmb{\beta}$ we require that $\pmb{\beta}^T=[\pmb{I}_m,\pmb{\beta}^T_1]$, where $\pmb{T}_m$ is a $m\times m$ identity matrix and $\pmb{\beta}_1$ is a $(k-m)\times m$ matrix. There are a few more constraints for the process $\pmb{w}_t=\pmb{\beta}^T\pmb{x}_t$ to be unit-root stationary.

The rank of $\pmb{\Pi}$ in the ECM is the number of cointegrating vectors. Thus, to test for cointegration, once can examine the rank of $\pmb{\Pi}$, the approach taken in Johansen test.

Deterministic function: Limiting distributions of cointegration tests depend on the deterministic function $\pmb{\mu}_t$.
1) $\pmb{\mu}_t=\pmb{0}$: All components series of $\pmb{x}_t$ are $I(1)$ without drift and the stationary series $\pmb{w}_t=\pmb{\beta}^T\pmb{x}_t$ has mean zero.
2) $\pmb{\mu}_t=\pmb{\alpha}\pmb{c}_0$: Components of $\pmb{x}_t$ are $I(1)$ without drift, but $\pmb{w}_t$ have a nonzero mean $-\pmb{c}_0$, called restricted constant.
3) $\pmb{\mu}_t=\pmb{\mu}_0$: Component series are $I(1)$ with drift $\pmb{\mu}_0$ and $\pmb{w}_t$ may have a nonzero mean.
4)  $\pmb{\mu}_t=\pmb{\mu}_0+\pmb{\alpha c}_1 t$: Components of $\pmb{x}_t$ are $I(1)$ with drift $\pmb{\mu}_0$ and $\pmb{w}_t$ has a linear time trend, called restricted trend.
5) $\pmb{\mu}_t=\pmb{\mu}_0+\pmb{\mu}_1t$: Both the constant and trend are unrestricted. The components of $\pmb{x_t}$ are $I(1)$ and have a quadratic time trend and $\pmb{w}_t$ have a linear trend.

Maximum likelihood estimation: Estimation of VAR(p) is quite involved. The deterministic term ($\pmb{x}_t$) and stationary terms ($\Delta\pmb{x}_t$) are first bifurcated and estimated using linear regression and have error terms are $\pmb{u}_t$ and $\pmb{v}_t$ respectively. A relevant eigenvalue problem when solved leads to a likelihood which when maximized gives the estimates of the coefficients.

Johansen test for cointegration: This is esentially testing the rank of the matrix $\pmb{\Pi}$, for a specified deterministic term $\pmb{\mu}_t$. The number of non-zero eigenvalues of $\pmb{\Pi}$ can be obtained if a consistent estimate of $\pmb{\Pi}$ is available. Looking at the ECM equation it is clear that $\pmb{\Pi}$ is related to the covariance matrix between $\pmb{x}_{t-1}$ and $\Delta\pmb{x}_t$ after adjusting for the effects of deterministic trend term and $\Delta\pmb{x}_{t-i}$ for $i=1,...,p-1$. Using canonical correlation analysis between the two adjusted equation, the squared correlation between them are calculated to be $\hat{\lambda}_i$. There are two versions of Johansen test:

1) Trace cointegration test:- $H_0$: Rank($\pmb{\Pi}$) = $m$ versus $H_a$: Rank($\pmb{\Pi}$)>$m$. The Likelihood ratio (LR) statistic is
$$LK_{tr}(m) = -(T-p)\sum_{i=m+1}^{k}ln(1-\hat{\lambda}_i)$$
Due to the presence of unit-roots, the asymptotic distribution of statistic is not chi-squared, but a function of standard Brownian motions. Thus, the critical values must be obtained via simulation.

2) Sequential test:- $H_0$: Rank($\pmb{\Pi}$) = $m$ versus $H_a$: Rank($\pmb{\Pi}$)=$m+1$. The $LK$ ratio test statistic, called the maximum eigenvalue statistic, is
$$LK_{max}(m)=-(T-p)ln(1-\hat{\lambda}_{m+1})$$
Again, the critical values of the test statistics are nonstandard and must be evaluated via simulation.

Left out sections: 8.7, 8.8