# Liangjun Su Publications

## Abstract

This paper considers a linear panel model with interactive fixed effects and unobserved individual and time heterogeneities that are captured by some latent group structures and an unknown structural break, respectively. To enhance realism, the model may have different numbers of groups and/or different group memberships before and after the break. With preliminary nuclear norm regularized estimation followed by row- and column-wise linear regressions, we estimate the break point based on the idea of binary segmentation and the latent group structures together with the number of groups before and after the break by sequential testing K-means algorithm simultaneously. It is shown that the break point, the number of groups and the group memberships can each be estimated correctly with probability approaching one. Asymptotic distributions of the estimators of the slope coefficients are established. Monte Carlo simulations demonstrate excellent finite sample performance for the proposed estimation algorithm. An empirical application to real house price data across 377 Metropolitan Statistical Areas in the US from 1975 to 2014 suggests the presence both of structural breaks and of changes in group membership.

## Abstract

This paper considers a linear panel model with interactive fixed effects and unobserved individual and time heterogeneities that are captured by some latent group structures and an unknown structural break, respectively. To enhance realism the model may have different numbers of groups and/or different group memberships before and after the break. With the preliminary nuclear-norm-regularized estimation followed by row- and column-wise linear regressions, we estimate the break point based on the idea of binary segmentation and the latent group structures together with the number of groups before and after the break by sequential testing K-means algorithm simultaneously. It is shown that the break point, the number of groups and the group member-ships can each be estimated correctly with probability approaching one. Asymptotic distributions of the estimators of the slope coefficients are established. Monte Carlo simulations demonstrate excellent ﬁnite sample performance for the proposed estimation algorithm. An empirical application to real house price data across 377 Metropolitan Statistical Areas in the US from 1975 to 2014 suggests the presence both of structural breaks and of changes in group membership.## Abstract

This paper studies high-dimensional vector autoregressions (VARs) augmented with common factors that allow for strong cross-sectional dependence. Models of this type provide a convenient mechanism for accommodating the interconnectedness and temporal co-variability that are often present in large dimensional systems. We propose an ℓ1-nuclear-norm regularized estimator and derive the non-asymptotic upper bounds for the estimation errors as well as large sample asymptotics for the estimates. A singular value thresholding procedure is used to determine the correct number of factors with probability approaching one. Both the LASSO estimator and the conservative LASSO estimator are employed to improve estimation precision. The conservative LASSO estimates of the non-zero coefficients are shown to be asymptotically equivalent to the oracle least squares estimates. Simulations demonstrate that our estimators perform reasonably well in finite samples given the complex high-dimensional nature of the model. In an empirical illustration we apply the methodology to explore dynamic connectedness in the volatilities of financial asset prices and the transmission of ‘investor fear’. The findings reveal that a large proportion of connectedness is due to the common factors. Conditional on the presence of these common factors, the results still document remarkable connectedness due to the interactions between the individual variables, thereby supporting a common factor augmented VAR specification.

## Abstract

This paper studies a linear panel data model with interactive fixed effects wherein regressors, factors and idiosyncratic error terms are all stationary but with potential long memory. The setup involves a new factor model formulation for which weakly dependent regressors, factors and innovations are embedded as a special case. Standard methods based on principal component decomposition and least squares estimation, as in Bai (2009), are found to suffer bias correction failure because the order of magnitude of the bias is determined in a complex manner by the memory parameters. To cope with this failure and to provide a simple implementable estimation procedure, frequency domain least squares estimation is proposed. The limit distribution of this frequency domain approach is established and a hybrid selection method is developed to determine the number of factors. Simulations show that the frequency domain estimator is robust to short memory and outperforms the time domain estimator when long range dependence is present. An empirical illustration of the approach is provided, examining the long-run relationship between stock return and realized volatility.

## Abstract

This paper studies high-dimensional vector autoregressions (VARs) augmented with common factors that allow for strong cross section dependence. Models of this type provide a convenient mechanism for accommodating the interconnectedness and temporal co-variability that are often present in large dimensional systems. We propose an `1-nuclear-norm regularized estimator and derive non-asymptotic upper bounds for the estimation errors as well as large sample asymptotics for the estimates. A singular value thresholding procedure is used to determine the correct number of factors with probability approaching one. Both the LASSO estimator and the conservative LASSO estimator are employed to improve estimation precision. The conservative LASSO estimates of the non-zero coeﬀicients are shown to be asymptotically equivalent to the oracle least squares estimates. Simulations demonstrate that our estimators perform reasonably well in ﬁnite samples given the complex high dimensional nature of the model with multiple unobserved components. In an empirical illustration we apply the methodology to explore the dynamic connectedness in the volatilities of ﬁnancial asset prices and the transmission of investor fear. The ﬁndings reveal that a large proportion of connectedness is due to common factors. Conditional on the presence of these common factors, the results still document remarkable connectedness due to the interactions between the individual variables, thereby supporting a common factor augmented VAR speciﬁcation.

## Abstract

This paper studies estimation of a panel data model with latent structures where individuals can be classiﬁed into diﬀerent groups where slope parameters are homogeneous within the same group but heterogeneous across groups. To identify the unknown group structure of vector parameters, we design an algorithm called Panel-CARDS which is a systematic extension of the CARDS procedure proposed by Ke, Fan, and Wu (2015) in a cross section framework. The extension addresses the problem of comparing vector coeﬀicients in a panel model for homogeneity and introduces a new concept of controlled classiﬁcation of multidimensional quantities called the segmentation net. We show that the Panel-CARDS method identiﬁes group structure asymptotically and consistently estimates model parameters at the same time. External information on the minimum number of elements within each group is not required but can be used to improve the accuracy of classiﬁcation and estimation in ﬁnite samples. Simulations evaluate performance and corroborate the asymptotic theory in several practical design settings. Two empirical economic applications are considered: one explores the eﬀect of income on democracy by using cross-country data over the period 1961-2000; the other examines the eﬀect of minimum wage legislation on unemployment in 50 states of the United States over the period 1988-2014. Both applications reveal the presence of latent groupings in these panel data.

## Abstract

This paper provides a novel mechanism for identifying and estimating latent group structures in panel data using penalized regression techniques. We focus on linear models where the slope parameters are heterogeneous across groups but homogenous within a group and the group membership is unknown. Two approaches are considered — penalized least squares (PLS) for models without endogenous regressors, and penalized GMM (PGMM) for models with endogeneity. In both cases we develop a new variant of Lasso called classiﬁer-Lasso (C-Lasso) that serves to shrink individual coeﬀicients to the unknown group-speciﬁc coeﬀicients. C-Lasso achieves simultaneous classiﬁcation and consistent estimation in a single step and the classiﬁcation exhibits the desirable property of uniform consistency. For PLS estimation C-Lasso also achieves the oracle property so that group-speciﬁc parameter estimators are asymptotically equivalent to infeasible estimators that use individual group identity information. For PGMM estimation the oracle property of C-Lasso is preserved in some special cases. Simulations demonstrate good ﬁnite-sample performance of the approach both in classiﬁcation and estimation. An empirical application investigating the determinants of cross-country savings rates ﬁnds two latent groups among 56 countries, providing empirical conﬁrmation that higher savings rates go in hand with higher income growth.

## Abstract

This paper proposes a nonparametric test for common trends in semiparametric panel data models with ﬁxed eﬀects based on a measure of nonparametric goodness-of-ﬁt (R²). We ﬁrst estimate the model under the null hypothesis of common trends by the method of proﬁle least squares, and obtain the augmented residual which consistently estimates the sum of the ﬁxed eﬀect and the disturbance under the null. Then we run a local linear regression of the augmented residuals on a time trend and calculate the nonparametric R² for each cross section unit. The proposed test statistic is obtained by averaging all cross sectional nonparametric R²’s, which is close to zero under the null and deviates from zero under the alternative. We show that after appropriate standardization the test statistic is asymptotically normally distributed under both the null hypothesis and a sequence of Pitman local alternatives. We prove test consistency and propose a bootstrap procedure to obtain p-values. Monte Carlo simulations indicate that the test performs well in ﬁnite samples. Empirical applications are conducted exploring the commonality of spatial trends in UK climate change data and idiosyncratic trends in OECD real GDP growth data. Both applications reveal the fragility of the widely adopted common trends assumption.

## Abstract

This paper explores a paradox discovered in recent work by Phillips and Su (2009). That paper gave an example in which nonparametric regression is consistent whereas parametric regression is inconsistent even when the true regression functional form is known and used in regression. This appears to be a paradox, as knowing the true functional form should not in general be detrimental in regression. In the present case, local regression methods turn out to have a distinct advantage because of endogeneity in the regressor. The paradox arises because additional correct information is not necessarily advantageous when information is incomplete. In the present case, endogeneity in the regressor introduces bias when the true functional form is known, but interestingly does not do so in local nonparametric regression. We examine this example in detail and propose two new consistent estimators for the parametric regression, which address the endogeneity in the regressor by means of spatial bounding and bias correction using nonparametric estimation. Some simulations are reported illustrating the paradox and the new procedures.

## Abstract

Recent work by Wang and Phillips (2009b, c) has shown that ill posed inverse problems do not arise in nonstationary nonparametric regression and there is no need for nonparametric instrumental variable estimation. Instead, simple Nadaraya Watson nonparametric estimation of a (possibly nonlinear) cointegrating regression equation is consistent with a limiting (mixed) normal distribution irrespective of the endogeneity in the regressor, near integration as well as integration in the regressor, and serial dependence in the regression equation. The present paper shows that some closely related results apply in the case of structural nonparametric regression with independent data when there are continuous location shifts in the regressor. In such cases, location shifts serve as an instrumental variable in tracing out the regression line similar to the random wandering nature of the regressor in a cointegrating regression. Asymptotic theory is given for local level and local linear nonparametric estimators, links with nonstationary cointegrating regression theory and nonparametric IV regression are explored, and extensions to the stationary strong mixing case are given. In contrast to standard nonparametric limit theory, local level and local linear estimators have identical limit distributions, so the local linear approach has no apparent advantage in the present context. Some interesting cases are discovered, which appear to be new in the literature, where nonparametric estimation is consistent whereas parametric regression is inconsistent even when the true (parametric) regression function is known. The methods are further applied to establish a limit theory for nonparametric estimation of structural panel data models with endogenous regressors and individual eﬀects. Some simulation evidence is reported.