Apr 3 -- The U.S. Census Bureau (Census Bureau) has been working to implement modernized methods to continue to ensure the privacy protections of its information products and seeks public engagement and comment on these efforts. The Census Bureau is targeting the release the 2022 County Business Patterns (CBP) data using differential privacy methodology for disclosure avoidance. The Census Bureau has created demonstration tables and invites the public to participate in a live question-and-answer webinar on April 20, 2023, to learn more about how the differential privacy methodology is being applied to the CBP data. This Notice requests written comments on the demonstration tables and other issues related to this topic.
A live question-and-answer webinar will be held on Thursday, April 20, 2023, at 3 p.m. Eastern Daylight Time, for discussion of how the differential privacy methodology is applied to the CBP data. The webinar will be recorded. Written comments must be submitted on or before June 2, 2023.
The CBP is an annual series that provides subnational economic data by industry. This series includes estimates of the number of establishments, employment during the week of March 12, first quarter payroll, and annual payroll for subnational geographic areas. This data is useful for studying the economic activity of small areas; analyzing economic changes over time; and as a benchmark for other statistical series, surveys, and databases between economic censuses. Businesses use the data for analyzing market potential, measuring the effectiveness of sales and advertising programs, setting sales quotas, and developing budgets. Government agencies use the data for administration and planning.
A noise infusion technique referred to as multiplicative noise has been the Census Bureau's disclosure avoidance methodology for CBP data since reference year 2007. This method of disclosure avoidance perturbs each establishment's data prior to table creation by applying a random noise multiplier to the magnitude data (i.e., characteristics such as first-quarter payroll, annual payroll, and number of employees) for each establishment. Each published table's cell value has an associated noise flag indicating the relative amount of distortion in the cell value resulting from the perturbation of the data contributing to the cell. The flag for “low noise” (G) indicates the cell value was changed by less than 2 percent with the application of noise, the flag for “moderate noise” (H) indicates the value was changed by at least 2 percent but less than 5 percent, and the flag for “high noise” (J) indicates the value was changed 5 percent or more. Values for some cells in the table may be suppressed (denoted with an S) because of concerns about the quality of the data. Also, beginning with reference year 2017, a cell is only published if it is based on data from three or more establishments. In all other cases, the cell is not included in the release (i.e., the corresponding table row is dropped from publication).
The proposed statistical disclosure limitation approach makes use of controlled, randomized noise added to published statistics to limit the extent to which public data users can make inferences about establishments in the internal, private CBP database. The approach includes two components: (1) Per-Record Differential Privacy, which gives a formal, mathematically provable privacy guarantee against exact inferences about establishments in the private database; and (2) non-differentially private, second-stage noise. Second-stage noise does not confer a formal privacy guarantee, but it ensures that large establishments present in published CBP statistics have a level of relative protection that increases as the number of establishments contributing to a published statistic decreases.
The Census Bureau has created demonstration tables to illustrate how the new differential privacy methodology for disclosure avoidance can be applied to produce CBP estimates and will discuss this application during the April 20th webinar. The tables show estimates of the number of establishments, number of employees, first-quarter payroll, and annual payroll across geographic, industry, legal form of organization, and employment size levels. The input data for the demonstration tables are a set of synthetic microdata created solely from previously published CBP results. This approach ensures that existing disclosure avoidance safeguards are not compromised by the publication of the demonstration tables. The demonstration tables also include summary statistics of the uncertainty introduced by the new differential privacy methodology and comparison with the uncertainty introduced by the current disclosure avoidance methodology. We invite comments on these demonstration tables, including use cases (examples of how CBP data are used) and whether the new methodology affects these use cases (including whether the amount of noise shown in the demonstration tables would prevent or change any analyses for those use cases).
CBP Demonstration Tables for New Differential Privacy Methodology for Disclosure Avoidance:
https://www.census.gov/topics/business-economy/disclosure/data/tables/cbp-privacy-demonstration-tables.html
Press release:
https://www.census.gov/newsroom/press-releases/2023/proposed-disclosure-avoidance-measures-for-cbp.html
FRN:
https://www.federalregister.gov/d/2023-06774