## Tuesday, June 30, 2015

### Tsay Ch5 - High Frequency Data Analysis and Market Microstructure

For daily stock returns, non-synchronous trading can introduce
1) cross correlations between stock returns
2) serial correlation in a portfolio return
3) sometimes negative serial correlations in the return series of a stock.

Introduces lag-1 serial correlation in an asset return, called bid-ask bounce.

#### Empirical characteristics

Aggregation don't show some of the characteristics of transactions data
1) unequally spaced time intervals - duration between trades might contain useful information about market micro-structure e.g. trading intensity
2) discrete-valued prices - along with limits
3) Existence of a daily periodic pattern - thinner during lunch hour.
4) Multiple transactions within a single second

Overnight stocks returns differ substantially from intraday returns (Stoll and Whaley 1990). Intraday trading has exploded with multiple transactions within second.

#### Models for price change

The discreteness and concentration on 'no change' make it difficult to model the intraday price changes. There are two models - ordered probit model (Hauseman, Lo and MacKinlay 1992) and a decomposition model (McCulloch and Tsay 2000). These models find prediction challenging, but are more used for understanding purposes.
Ordered Probit model - For $P_t^*$ being the fundamental value of the asset in a friction-less market and $P_t$ being the observed price, we define $y_i^*=P^*_{t_i}-P^*_{t_{i-1}}$ and model $y^*_i$ as a continuous random variable given by $y^*_i=\bf{x_i}\bf{\beta}+\epsilon_i$. The observed value $y_i$ can be categorized in ordered set ${s_1,...,s_k}$. Generally a normal distribution is assumed. The model can be estimated by maximum likelihood or MCMC methods. Explanatory variables $\bf{x_i}$ can be time duration, lagged prices, lagged SP500 price, bid-ask spread and direction, lagged volume. Volatility can be explained using duration and bid-ask spread as well.
A decomposition model (ADS) - indicator for price change, direction of price change, and the size of price change, $y_i = P_{t_i}-P_{t_{i-1}}=A_iD_iS_i$, where ordering is important. Each of these terms are modeled as logistic regression using explanatory variables and estimated using log likelihood.

#### Duration models - ACD

Concerned with time intervals between trades. Longer durations indicate lack of trading activities, which means no new information. Before the duration can be modeled the diurnal pattern has to be removed from the time series. This is done by calculating adjusted time duration $\Delta t^*_i=\Delta t_i/f(t_i)$. $f(t_i)=exp(\beta_0+\sum_1^7\beta_j f_j(t_i))$, where the $f_i$ are functions defined to take care of first 5 minutes, last 30 minutes, and mid period, depending on the asset and profile. We can then fit the autoregressive conditional duration model. $f(t_i)$ is commonly estimated using smoothing splines. One way is to use combination of quadratic functions and indicator variables to take care of deterministic components of daily trading activities.
$$f(t_i)=e^{d(t_i)} \qquad d(t_i)=\beta_0+\sum_1^7\beta_j f_j(t_i))$$
where, $f_1, f_2, f_3, f_4$ are quadratic functions fitted for specific data (Tsay pg 225).  $f_5$ and $f_6$ are indicator variables for the first and second 5 minutes of market open, and $f_7$ is the indicator for the last 30 minutes of daily trading. The coefficients can be determined by least sqaures method
$$ln(\delta t_i)=\beta_0+\sum_1^7\beta_j f_j(t_i)) +\epsilon_i$$
The autoregressive conditional duration (ACD) model uses the idea of GARCH models to study the dynamic structure of the adjusted duration $\Delta t^*_i$. For $x_i=\Delta t^*_t$ and $\psi_i=E(x_i|F_{i-1})$, the model is defined as $x_i=\psi_i\epsilon_i$, where $\epsilon_i$ follows a standard exponential (EACD) or a standard Weibull (WACD) distribution. Further, similar to GARCH, we have ACD(r,s) model
$$\psi_i=\omega+\sum^r_{j=1}\gamma_j x_{i-j}+\sum^s_{j=1}\omega_j \psi_{i-j}$$
with $\gamma_j=0$ for $j>r$ and $\omega_j=0$ for $j>s$. For stationarity $\omega>0$ and $1>\sum_j(\gamma_j+\omega_j)$.

EACD(1,1) model: $\epsilon_i$ as exponential distribution $x_i=\psi_i\epsilon_i$ and $\psi_i=\omega+\gamma_1x_{i-1}+\omega_1\psi_{i-1}$. We have $E(\epsilon_i)=1$ and $Var(\epsilon_i)=1$, implying $E(\epsilon_i^2)=2$. Assuming weak stationarity,
$$E(x_i)=\frac{\omega}{1-\gamma_1-\omega_1}.$$
$$Var(x_i)=\mu_x^2\frac{1-\omega_1^2-2\gamma_1\omega_1}{1-\omega_1^2-2\gamma_1^2-2\gamma_1\omega_1}.$$
Hence, $1>2\gamma_1^2+\omega_1^2+2\gamma_1\omega_1$ for stationarity.

#### Bivariate models for price change and duration - PCD

Jointly modeling the price change and associated duration process. Focus on transactions that result in a price change $P_{t_i}=P_{t_{i-1}}+D_i S_i$, where $D_i$ is the direction change dummy and $S_i$ is the size change variable. It reduces the number of data point and there is no diurnal pattern in time durations between price changes. The PCD model decomposes the joint distribution of $(\Delta t_i, N_i, D_i, S_i)$ given $F_{i-1}$ as
$$f(\Delta t_i, N_i, D_i, S_i | F_{i-1})=f(S_i|D_i, N_i, \Delta t_i, F_{i-1}) f(D_i|N_i, \Delta t_i, F_{i-1}) f(N_i|\Delta t_i, F_{i-1}) f(\Delta t_i|F_{i-1})$$
where the $i^{th}$ transaction data consists of $\Delta t_i$ duration, $N_i$ number of trades in the period, $D_i$ direction of price change, $S_i$ size of price change in ticks. There are many ways to specify the conditional distributions depending on the asset under study. Using McCulloch and Tsay (2000) generalized linear models for discrete valued variables and time series model for continuous variable $ln(\Delta t_i)$ we get
$$ln(\Delta t_i) = \beta_0 + \beta_1 ln(\Delta t_{i-1}) + \beta_2 S_{i-1} + \sigma \epsilon_i.$$
Log transformation is added to ensure positiveness. Due to concentration of $N_i$ at 0, we partition the model for $N_i$ in tow parts.
$$p(N_i=0|\Delta t_i, F_{i-1}) = logit[\alpha_0+\alpha_1 ln(\Delta t_i)]$$
where $logit(x)=e^x/(1+e^x)$, whereas the second part of the model is
$$N_i|(N_i>0, \Delta t_i, F_{i-1}) \sim 1+g(\lambda_i) \qquad \lambda_i=\frac{e^{\gamma_0+\gamma_1 ln(\Delta t_i)}}{1+e^{\gamma_0 + \gamma_1 ln(\Delta t_i)}}$$
where $g(\lambda)$ denotes a geometric distribution with parameter $\lambda$, in interval (0,1). The model for direction $D_i$ is
$$D_i|(N_i, \Delta t_i, F_{i-1})=sign(\mu_i+\sigma_i\epsilon)$$
where $\epsilon$ is a $N(0,1)$ random variable, and
$$\mu_i=\omega_0+\omega_1 D_{i-1}+\omega_2 ln(\Delta t_i),$$
$$ln(\sigma_i)=\beta|D_{i-1}+D_{i-2}+D_{i-3}+D_{i-4}|$$
To allow for different dynamics between positive and negative price movements, we use different models for the size of a price change.
$$S_i | (D_i=-1, N_i, \Delta t_i, F_{i-1}) \sim p(\lambda_{d,i})+1 \qquad ln(\lambda_{d,i})=\eta_{d,0}+\eta_{d,1}N_i+\eta_{d,2}ln(\delta t_i)+\eta_{d,3}S_{i-1}$$
$$S_i | (D_i=1, N_i, \Delta t_i, F_{i-1}) \sim p(\lambda_{u,i})+1 \qquad ln(\lambda_{u,i})=\eta_{u,0}+\eta_{u,1}N_i+\eta_{u,2}ln(\delta t_i)+\eta_{u,3}S_{i-1}$$
where $p(\lambda)$ denotes a Poisson distribution with parameter $\lambda$, and 1 is added to the size because the minimum size is 1 tick when there is a price change. Estimation can be done either by maximum likelihood or MCMC methods.

Left out sections: 5.6