## Sunday, August 16, 2015

### Tsay Ch 11 - State-space models and kalman filter

#### Local trend model

For a univariate time series $y_t=\mu_t+\epsilon_t$ and $\mu_{t+1}=\mu_t+\eta_t$, both the error terms are assumed to be normally distributed to distinct variance $\sigma_e^2$ and $\sigma_{\eta}^2$ respectively. Notice the first equation is the observed version of the second trend model with added noise. This model can be used to analyze realized volatility of an asset price if $\mu_t$ is assumed to be the log volatility (which is not directly observable) and $y_t$ is the logarithm or realized volatility (which is observable constructed from high-frequency transaction data with microstructure noise).

If there is no measurement error term in the first equation ($\sigma_e=0$) this becomes a ARIMA(0,1,0) model. With the error term it is a ARIMA(0,1,1) model, which is also the simple exponential smoothing model. The form is $(1-B)y_t=(1-\theta B)a_t$, $\theta$ and $\sigma_{a}^2$ are related to $\sigma^2_e$ and $\sigma^2_{\eta}$ as follows: $(1+\theta^2)\sigma^2_a=2\sigma^2_e+\sigma^2_{\eta}$ and $\theta \sigma^2_a=\sigma^2_e.$ The quadratic equation for $\theta$ will give two solutions with $|\theta|<1$ chosen. The reverse is also possible for positive $\theta$. Both representations have pros and cons and the objective of data analysis, substantive issues and experience decide which to use.

#### Statistical Inference

Three types (using reading handwritten note example)
1. Filtering - recover state variable $\mu_t$ given $F_t$ to remove the measurement errors from the data. (figuring out the word you are reading based on knowledge accumulated from the beginning to the note).
2. Prediction - forecast $\mu_{t+h}$ or $y_{t+h}$ for $h>0$ given $F_t$, where $t$ is the forecast origin. (guess the next word).
3. Smoothing - estimate $\mu_t$ given $F_T$, where $T>t$. (deciphering a particular word once you have read through the note).

#### The Kalman Filter

Let $\mu_{t|j}=E(\mu_t|F_j)$ and $\Sigma_{t|j}=Var(\mu_t|F_j)$ be, respectively, the conditional mean and variance of $\mu_t$ given information $F_j$. Similarly $y_{t|j}$ denotes the conditional mean of $y_t$ given $F_j$. Furthermore let $v_t=y_t-y_{t|j}$ and $V_t=Var(v_t|F_{t-1})$ be 1-step ahead forecast error and its variance of $y_t$ given $F_{t-1}$. Note that $Var(v_t|F_{t-1})=Var(v_t)$, since the forecast error $v_t$ is independent of $F_{t-1}$. Further, $y_{t|t-1}=\mu_{t|t-1}$ giving $v_t=y_t-\mu_{t|t-1}$ and $V_t=\Sigma_{t|t-1}+\sigma^2_e$. Also, $E(v_t)=0$ and $Cov(v_t,y_t)=0$ for $j<t$. The information $F_t \equiv \{F_{t-1},y_t\} \equiv \{F_{t-1},v_t\}$, hence $\mu_{t|t}=E(\mu_t|F_{t-1},v_t)$ and $\Sigma_{t|t}=Var(\mu_t|F_{t-1},v_t)$.

One can show that $Cov(\mu_t,v_t|F_{t-1})=\Sigma_{t|t-1}$ giving, $$\begin{bmatrix} \mu_t \\ v_t \end{bmatrix}_{F_{t-1}} \sim N\left( \begin{bmatrix} \mu_{t|t-1} \\ 0 \end{bmatrix}, \begin{bmatrix} \Sigma_{t|t-1} & \Sigma_{t|t-1} \\ \Sigma_{t|t-1} & V_t\end{bmatrix} \right).$$ Applying the multivariate normal theorem we get $$\mu_t|t = \mu_{t|t-1}+(V_t^{-1}\Sigma_{t|t-1})v_t=\mu_{t|t-1}+K_tv_t,$$ $$\Sigma_{t|t}=\Sigma_{t|t-1}-\Sigma_{t|t-1}V_t^{-1}\Sigma_{t|t-1} = \Sigma_{t|t-1}(1-K_t),$$ where $K_t=V_t^{-1}\Sigma_{t|t-1}$ is referred to as the Kalman gain, which is the regression coefficient of $\mu_t$ on $v_t$, governing the contribution of th enew shock $v_t$ to the state variable $\mu_t$. To predict $\mu_{t+1}$ given $F_t$ we have $$\mu_{t+1|t} \sim N(\mu_{t|t}, \Sigma_{t|t}+\sigma^2_{\eta}).$$ once the new data $y_{t+1}$ is observed, the above procedure can be repeated (obviously once $\sigma_e$ and $\sigma_{\eta}$ are estimated, generally using maximum likelihood method). This is the famous Kalman filter algorithm (1960). The choice of priors $\mu_{1|0}$ and $\Sigma_{1|0}$ requires some attention.

Properties of forecast error -
State error recursion -
State smoothing -
Missing Values -
Effect of Initialization -
Estimation -