« Back to Results

Analysis of Panel and Clustered Data

Paper Session

Saturday, Jan. 4, 2020 8:00 AM - 10:00 AM (PDT)

Marriott Marquis, Mission Hills
Hosted By: Econometric Society
  • Chair: Douglas Steigerwald, University of California-Santa Barbara

Detection of Units with Pervasive Effects in Large Panel Data Models

George Kapetanios
,
King's College London
M. Hashem Pesaran
,
University of Southern California and Trinity College Cambridge
Simon Reese
,
University of Southern California

Abstract

The importance of units with pervasive impacts on a large number of other units in a network has become increasingly recognized in the literature. In this paper we propose a new method to detect such pervasive units by basing our analysis on unit-specific residual error variances in the context of a standard factor model, subject to suitable adjustments due to multiple testing. Our proposed method allows us to estimate and identify pervasive units having neither a priori knowledge of the interconnections amongst cross-section units nor a short list of candidate units. It is applicable even if the cross section dimension exceeds the time dimension, and most importantly it could end up with none of the units selected as pervasive when this is in fact the case. The sequential multiple testing procedure proposed exhibits satisfactory small-sample performance in Monte Carlo simulations and compares well relative to existing approaches. We apply the proposed detection method to sectoral indices of US industrial production, US house price changes by states, and the rates of change of real GDP and real equity prices across the world's largest economies.

Shift-Share Designs: Theory and Inference

Rodrigo Adao
,
University of Chicago
Michal Kolesar
,
Princeton University
Eduardo Morales
,
Princeton University

Abstract

We study inference in shift-share regression designs, such as when a regional outcome is regressed on a weighted average of observed sectoral shocks, using regional sector shares as weights. We conduct a placebo exercise in which we estimate the effect of a shift-share regressor constructed with randomly generated sectoral shocks on actual labor market outcomes across U.S. Commuting Zones. Tests based on commonly used standard errors with 5% nominal significance level reject the null of no effect in up to 55% of the placebo samples. We use a stylized economic model to show that this overrejection problem arises because regression residuals are correlated across regions with similar sectoral shares, independently of their geographic location. We derive novel inference methods that are valid under arbitrary cross-regional correlation in the regression residuals. We show that our methods yield substantially wider confidence intervals in popular applications of shift-share regression designs.

Robust Semiparametric Estimation in Panel Multinomial Choice Models

Wayne Yuan Gao
,
Yale University
Ming Li
,
Yale University

Abstract

This paper proposes a simple yet robust method for semiparametric identification and estimation in panel multinomial choice models, where we allow for infinite dimensional fixed effects in the presence of additive nonseparability, thus incorporating rich forms of unobserved heterogeneity. Our identification strategy exploits the standard notion of multivariate monotonicity in its contrapositive form, which provides powerful leverage for converting observable events into identifying restrictions on unknown parameters. Specifically, we show how certain configurations of conditional choice probabilities preserve weak monotonicity in an index vector, despite the presence of infinite-dimensional nuisance parameters. Then, by taking the logical contraposition of an intertemporal inequality on conditional choice probabilities from two time periods, we obtain an identifying restriction on the index values. Based on our identification result, we construct consistent set (or point) estimators, together with a computational algorithm adapted to the challenges of this framework. The first step of our two-stage procedure nonparametrically estimates a collection of inequalities concerning intertemporal differences in conditional choice probabilities, where we adopt a machine learning algorithm using artificial neural networks. In the second stage, we compute the final estimator as the minimizers of our sample criterion function. Here, we adopt a spherical-coordinate reparameterization to exploit a combination of topological, geometric and computational advantages. The estimated model is then shown to be further utilizable for counterfactual analysis, such as predicting the effect of a promotional campaign on product sales. ​ We conduct a simulation study to analyze the finite-sample performance of our method and the adequacy of our computational procedure for practical implementation. We then apply our procedure to the Nielsen data on popcorn sales to explore the effects of marketing promotion effects. In our model, we permit rich unobserved heterogeneity in factors such as brand loyalty or responsiveness to subtle flavor and packaging designs, which may affect choices in complex ways. The results show that our procedure produces estimates that conform well with economic intuition. For example, we find that special in-store displays boost sales not only through a direct promotion effect but also through the attenuation of consumers’ price sensitivity.

The Wild Bootstrap with a “Small” Number of “Large” Clusters

Andres Santos
,
University of California-Los Angeles
Ivan Canay
,
Northwestern University
Azeem Shaikh
,
University of Chicago

Abstract

This paper studies the properties of the wild bootstrap-based test proposed in Cameron et al. (2008) in settings with clustered data. Cameron et al. (2008) provide simulations that suggest this test works well even in settings with as few as five clusters, but existing theoretical analyses of its properties all rely on an asymptotic framework in which the number of clusters is “large." In contrast to these analyses, we employ an asymptotic framework in which the number of clusters is “small," but the number of observations per cluster is “large." In this framework, we provide conditions under which the limiting rejection probability of an un-Studentized version of the test does not exceed the nominal level. Importantly, these conditions require, among other things, certain homogeneity restrictions on the distribution of covariates. We further establish that the limiting rejection probability of a Studentized version of the test does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We study the relevance of our theoretical results for finite samples via a simulation study.

Inference for Dependent Data with Cluster Learning

Jianfei Cao
,
University of Chicago
Christian Hansen
,
University of Chicago
Damian Kozbur
,
University of Zurich
Lucciano Villacorta
,
Central Bank of Chile

Abstract

This paper proposes a cluster-based inferential procedure. Observations are grouped into clusters which are learned using a unsupervised learning algorithm given a dissimilarity measure. We consider a set of cluster-based inference procedure on the learned clusters. We give conditions under which our procedure asymptotically attains correct size. We illustrate the finite sample validity and apply our procedure to an empirical example.

Testing for Treatment Effects in Randomized Control Trials: The Effect of Differing Cluster Sizes

Douglas Steigerwald
,
University of California-Santa Barbara
Andrew Carter
,
University of California-Santa Barbara

Abstract

We consider a common situation where we have binary responses to compare between two samples, but the observations are not independent. The dependence structure is such that there are a number of independent clusters, but that the observations within each cluster are dependent. Sandwich-type variance estimators that are based entirely on the variability between the statistics for each cluster are robust to this kind of dependence. We establish a set of sufficient conditions to imply that the variance estimators are consistent, and therefore there is a test statistic that has a standard normal distribution in the limit. The conditions require that the number of clusters goes to infinity and that there is enough homogeneity that no one cluster dominates the calculation.
JEL Classifications
  • C2 - Single Equation Models; Single Variables
  • C3 - Multiple or Simultaneous Equation Models; Multiple Variables