Publications by author | Yale Department of Economics

Abstract

We introduce computationally simple, data-driven procedures for estimation and inference on a structural function h0 and its derivatives in nonparametric models using instrumental variables. Our ﬁrst procedure is a bootstrap-based, data-driven choice of sieve dimension for sieve nonparametric instrumental variables (NPIV) estimators. When implemented with this data-driven choice, sieve NPIV estimators of h0 and its derivatives are adaptive: they converge at the best possible (i.e., minimax) sup-norm rate, without having to know the smoothness of h0, degree of endogeneity of the regressors, or instrument strength. Our second procedure is a data-driven approach for constructing honest and adaptive uniform conﬁdence bands (UCBs) for h0 and its derivatives. Our data-driven UCBs guarantee coverage for h0 and its derivatives uniformly over a generic class of data-generating processes (honesty) and contract at, or within a logarithmic factor of, the minimax sup-norm rate (adaptivity). As such, our data-driven UCBs deliver asymptotic eﬀiciency gains relative to UCBs constructed via the usual approach of undersmoothing. In addition, both our procedures apply to nonparametric regression as a special case. We use our procedures to estimate and perform inference on a nonparametric gravity equation for the intensive margin of ﬁrm exports and nd evidence against common parameterizations of the distribution of unobserved ﬁrm productivity.

Abstract

In complicated/nonlinear parametric models, it is hard to determine whether a parameter of interest is formally point identiﬁed. We provide computationally attractive procedures to construct conﬁdence sets (CSs) for identiﬁed sets of parameters in econometric models deﬁned through a likelihood or a vector of moments. The CSs for the identiﬁed set or for a function of the identiﬁed set (such as a subvector) are based on inverting an optimal sample criterion (such as likelihood or continuously updated GMM), where the cutoﬀ values are computed directly from Markov Chain Monte Carlo (MCMC) simulations of a quasi posterior distribution of the criterion. We establish new Bernstein-von Mises type theorems for the posterior distributions of the quasi-likelihood ratio (QLR) and proﬁle QLR statistics in partially identiﬁed models, allowing for singularities. These results imply that the MCMC criterion-based CSs have correct frequentist coverage for the identiﬁed set as the sample size increases, and that they coincide with Bayesian credible sets based on inverting a LR statistic for point-identiﬁed likelihood models. We also show that our MCMC optimal criterion-based CSs are uniformly valid over a class of data generating processes that include both partially- and point- identiﬁed models. We demonstrate good ﬁnite sample coverage properties of our proposed methods in four non-trivial simulation experiments: missing data, entry game with correlated payoﬀ shocks, Euler equation and ﬁnite mixture models.

Abstract

In complicated/nonlinear parametric models, it is generally hard to know whether the model parameters are point identiﬁed. We provide computationally attractive procedures to construct conﬁdence sets (CSs) for identiﬁed sets of full parameters and of subvectors in models deﬁned through a likelihood or a vector of moment equalities or inequalities. These CSs are based on level sets of optimal sample criterion functions (such as likelihood or optimally-weighted or continuously-updated GMM criterions). The level sets are constructed using cutoﬀs that are computed via Monte Carlo (MC) simulations directly from the quasi-posterior distributions of the criterions. We establish new Bernstein-von Mises (or Bayesian Wilks) type theorems for the quasi-posterior distributions of the quasi-likelihood ratio (QLR) and proﬁle QLR in partially-identiﬁed regular models and some non-regular models. These results imply that our MC CSs have exact asymptotic frequentist coverage for identiﬁed sets of full parameters and of subvectors in partially-identiﬁed regular models, and have valid but potentially conservative coverage in models with reduced-form parameters on the boundary. Our MC CSs for identiﬁed sets of subvectors are shown to have exact asymptotic coverage in models with singularities. We also provide results on uniform validity of our CSs over classes of DGPs that include point and partially identiﬁed models. We demonstrate good ﬁnite-sample coverage properties of our procedures in two simulation experiments. Finally, our procedures are applied to two non-trivial empirical examples: an airline entry game and a model of trade flows.

Abstract

We show that spline and wavelet series regression estimators for weakly dependent regressors attain the optimal uniform (i.e., sup-norm) convergence rate (n/log n)-p/(2p+d) of Stone (1982), where d is the number of regressors and p is the smoothness of the regression function. The optimal rate is achieved even for heavy-tailed martingale diﬀerence errors with ﬁnite (2 + (d/p))th absolute moment for d/p < 2. We also establish the asymptotic normality of t statistics for possibly nonlinear, irregular functionals of the conditional mean function under weak conditions. The results are proved by deriving a new exponential inequality for sums of weakly dependent random matrices, which is of independent interest.

Abstract

This paper makes several contributions to the literature on the important yet diﬀicult problem of estimating functions nonparametrically using instrumental variables. First, we derive the minimax optimal sup-norm convergence rates for nonparametric instrumental variables (NPIV) estimation of the structural function h0 and its derivatives. Second, we show that a computationally simple sieve NPIV estimator can attain the optimal sup-norm rates for h0 and its derivatives when h0 is approximated via a spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal L2-norm rates for severely ill-posed problems, and are only up to a [log(n)] ε (with ε < 1/2) factor slower than the optimal L2-norm rates for mildly ill-posed problems. Third, we introduce a novel data-driven procedure for choosing the sieve dimension optimally. Our data-driven procedure is sup-norm rate-adaptive: the resulting estimator of h0 and its derivatives converge at their optimal sup-norm rates even though the smoothness of h0 and the degree of ill-posedness of the NPIV model are unknown. Finally, we present two non-trivial applications of the sup-norm rates to inference on nonlinear functionals of h0 under low-level conditions. The ﬁrst is to derive the asymptotic normality of sieve t-statistics for exact consumer surplus and deadweight loss functionals in nonparametric demand estimation when prices, and possibly incomes, are endogenous. The second is to establish the validity of a sieve score bootstrap for constructing asymptotically exact uniform conﬁdence bands for collections of nonlinear functionals of h0. Both applications provide new and useful tools for empirical research on nonparametric models with endogeneity.

Abstract

We study the problem of nonparametric regression when the regressor is endogenous, which is an important nonparametric instrumental variables (NPIV) regression in econometrics and a diﬀicult ill-posed inverse problem with unknown operator in statistics. We ﬁrst establish a general upper bound on the sup-norm (uniform) convergence rate of a sieve estimator, allowing for endogenous regressors and weakly dependent data. This result leads to the optimal sup-norm convergence rates for spline and wavelet least squares regression estimators under weakly dependent data and heavy-tailed error terms. This upper bound also yields the sup-norm convergence rates for sieve NPIV estimators under i.i.d. data: the rates coincide with the known optimal L2-norm rates for severely ill-posed problems, and are power of log(n) slower than the optimal L2-norm rates for mildly ill-posed problems. We then establish the minimax risk lower bound in sup-norm loss, which coincides with our upper bounds on sup-norm rates for the spline and wavelet sieve NPIV estimators. This sup-norm rate optimality provides another justiﬁcation for the wide application of sieve NPIV estimators. Useful results on weakly-dependent random matrices are also provided.

Abstract

This paper makes several important contributions to the literature about nonparametric instrumental variables (NPIV) estimation and inference on a structural function h0 and its functionals. First, we derive sup-norm convergence rates for computationally simple sieve NPIV (series 2SLS) estimators of h0 and its derivatives. Second, we derive a lower bound that describes the best possible (minimax) sup-norm rates of estimating h0 and its derivatives, and show that the sieve NPIV estimator can attain the minimax rates when h0 is approximated via a spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal root-mean-squared rates for severely ill-posed problems, and are only a logarithmic factor slower than the optimal root-mean-squared rates for mildly ill-posed problems. Third, we use our sup-norm rates to establish the uniform Gaussian process strong approximations and the score bootstrap uniform conﬁdence bands (UCBs) for collections of nonlinear functionals of h0 under primitive conditions, allowing for mildly and severely ill-posed problems. Fourth, as applications, we obtain the ﬁrst asymptotic pointwise and uniform inference results for plug-in sieve t-statistics of exact consumer surplus (CS) and deadweight loss (DL) welfare functionals under low-level conditions when demand is estimated via sieve NPIV. Empiricists could read our real data application of UCBs for exact CS and DL functionals of gasoline demand that reveals interesting patterns and is applicable to other markets.