The conditional covariance matrix of $\pmb{a}_t$ given $F_{t-1}$ is a $k\times k$ positive definite matrix $\pmb{\Sigma}_t$ defined by $Cov(\pmb{a}_t|F_{t-1})$. Multivariate volatility modeling is concerned with the time evolution of $\pmb{\Sigma}_t$. This is referred to as the volatility model equation of $\pmb{r}_t$.

#### Exponentially weighted estimate

An equally weighted estimate of unconditional covariance matrix of the innovations can be estimated by $$\hat{\Sigma}=\frac{1}{t-1}\sum_{j=1}^{t-1}a_j a_j^T.$$ To allow for a time-varying covariance matrix with emphasis on recent information one can use exponential smoothing as $$\hat{\Sigma}_t=\frac{1-\lambda}{1-\lambda^{t-1}}\sum_{j=1}^{t-1}\lambda^{j-1}a_{t-j}a_{t-j}^T,$$ where $0<\lambda<1.$ For a sufficiently large t such that $\lambda^{t-1}\approx 0,$ the equation becomes $$\hat{\Sigma}_t=(1-\lambda)a_{t-1}a_{t-1}^T+\lambda \hat{\Sigma}_{t-1}.$$ This is called the EWMA estimate of covariance matrix. The parameters along with $\lambda$ can be jointly estimated using log-likelihood, which can be evaluated recursively. $\lambda$ of 0.94 (30 days) comes out commonly as optimal.

#### Some multivariate GARCH models

- Diagonal Vectorization model (VEC): generalization of exponentially weighted moving-average approach. Each element is a GARCH(1,1) type mode. May not produce a positive definite covariance matrix and does not model the dynamic dependence between volatility series.
- BEKK model: Baba-Engle-Kraft-Kroner model (1995) to guarantee the positive-definite constraint. Too many parameters but models dynamic dependence between the volatility series.

#### Reparameterization

$\Sigma_{t}$ is reparameterized by making used of the symmetric property.

**Use of correlations**- Covariance matrix can be represented as variances and lower triangle correlations and can be jointly modeled. Specifically, we write $\Sigma_t$ as $D_t\rho_tD_t$, where $\rho_t$ is the conditional correlation matrix of $a_t$, and $D_t$ is a $k \times k$ diagonal matrix consisting of conditional standard deviations of elements of $a_t$. To model the volatility of $a_t$, it suffices to consider the conditional variances and correlation coefficient of $a_{it}$. The $k(k+1)/2$ dimensional vector $\Xi_t= (\sigma_{11,t},...,\sigma_{kk,t}, \varrho_t^T)^T$, where $\varrho_t$ is a $k(k-1)/2$ dimensional vector obtained by stacking columns of the correlation matrix $\rho_t$, but using only the elements below the main diagonal, i.e. $\varrho_t=(\rho_{21,t},...,\rho_{k1,t}|\rho_{32,t},...,\rho_{k2,t}|...|\rho_{k,k-1,t})^T$. To illustrate, for $k=2$, we have $\varrho_t=\rho_{21,t}$ and $\Xi_t=(\sigma_{11,t},\sigma_{22,t},\rho_{21,t})^T$, which is a 3-dimensional vector. The approach has weaknesses because the likelihood function becomes complicated when the dimension is greater than 2. And the approach requires a constrained maximization to ensure positive definiteness.**Cholesky decomposition**- This requires no constrained maximization. This is orthogonal transformation so the resulting likelihood is extremely simple. Because $\Sigma_t$ is positive definite, there exist a lower triangular matrix $L_t$ with unit diagonal elements and a diagonal matrix $G_t$ with positive diagonal elements such that $\Sigma_t=L_tG_tL_t^T.$ A feature of the decomposition is that the lower off-diagonal elements of $L_t$ and the diagonal elements of $G_t$ have close connections with linear regression. Using Cholesky decomposition amounts to doing an orthogonal transformation from $a_t$ to $b_t$, where $b_{1t}=a_{1t}$, and $b_{it}$, for $1<i \le k$, is defined recursively by the least-square regression $a_{it}=q_{i1,t}b_{1t}+q_{i2,t}b_{2t}+...+q_{i(i-1),t}b_{(i-1)t}+b_{it}$, where $q_{ij,t}$ is the $(i,j)$th element of the lower triangular matrix $L_t$ for $1\le j <i$. We can write this transformation as $a_t=L_tb_t$, where $L_t$ is the lower triangular matrix with unit diagonal elements. The covariance matrix of $b_t$ is $G_t$. The parameter vector relevant to volatility modeling under such a transformation becomes $\Xi_t=(g_{11,t},...,g_{kk,t},q_{21,t},q_{31,t},q_{32,t},...,q_{k1,t},...,q_{k(k-1),t})^T$, which is also a $k(k+1)/2$ dimensional vector. The likelihood function also simplifies drastically. There are several advantages of this transformation. First, $\Sigma_t$ can be kept positive definite simply by modeling $ln(g_{ii,t})$. Second, element of $\Xi_t$ are simply the coefficients and residual variances of multiple linear regressions that orthogonalize the shocks to the returns. Third, the correlation coefficient between $a_{1t}$ and $a_{2t}$, which is simply $q_{21,t}\sqrt{\sigma_{11,t}}/\sqrt{\sigma_{22,y}}$, is time-varying. Finally, we get $\sigma_{ij,t}=\sum_{c=1}^{j}q_{iv,t}q_{jv,t}g_{vv,t}.$