Skip to Content

Stata Codes

The neweytsiv command

neweytsiv calculates the two-step semiparametric model of sample selection proposed by Newey (2009) using a series approximation to the correction term h(v(alpha,w)) in the presence of endogenous regressors in the second step.

As proposed by Newey (2009), the first step estimates the selection parameters (alpha) through a semiparametric procedure (Semi-parametric Maximum Likelihood (Klein and Spady, 1993)). This allows the calculation of the index v(alpha,w). The second step is a regression of depvar on varlist1, varlist2 and approximating functions of v(alpha,w) in the selected data, where varlist2 are endogenous regressors that are instrumented by varlist_iv. These estimators are analogous to Heckman's (1976) two-step procedure for the Gaussian disturbances case. The difference is that alpha is estimated by a distribution-free method rather than by probit, and a non-parametric approximation to h(v) is used in the second step regression rather than the inverse Mills ratio. 

The xtARGLS command

Guiteras, Moon and Sarzosa (in progress) builds on the contributions of Hansen (2007), who provides a bias-corrected estimator of the autocorrelation parameters in fixed effects panel data models, and Baltagi and Wu (1999), who show how to allow for missing-at-random data in AR(1) models. The resulting estimator: (1) allows for an AR model of arbitrary order p; (2) improves efficiency by using the bias-correction of Hansen (2007) in estimating the AR parameters; (3) allows for missing-at-random data; and (4) conducts inference robust to misspecification of the autocorrelation process.

[Go to xtargls webpage] 

The heterofactor command

heterofactor uses test scores to identify the distribution of unobserved endowments (factors). The estimated distributions are used to identify the loadings of such endowments on a given set of outcome equations.

Due to the fact that the endowments are not observed, heterofactor integrates them away using their estimated distributions within a maximum likelihood procedure. In consequence, heterofactor requires a model composed either one or two output equations, three test equations per factor, and allows for a binary choice equation. heterofactor uses a two step procedure where it first runs an ML estimation to obtain the factors' distributions, and then, uses these distributions to recreate the latent factors and obtain unbiased estimations in the output and choice equations.

[Download PDF]