What is dsep census?
The U.S. Census Bureau has had a longstanding requirement to ensure that the data from individuals and individual households remains confidential. For the 2020 census, it plans to use a new approach for doing so: “differential privacy.”
This webpage provides:
Check out NCSL’s letters to the U.S. House, Senate and the Census Bureau and the Bureau's response to NCSL's letter. On behalf of the states, NCSL has expressed concerns regarding the delays in releasing the census data to the states and the bureau’s use of differential privacy and its possible impact on the accuracy of census data.
The U.S. Census Bureau is required to do an “actual Enumeration” of all the people living in the U.S. every 10 years (U.S. Constitution, Article 1, Section 2). The bureau also is required to keep personally identifiable information confidential for 72 years (92 Stat. 915; Public Law 95-416). Title 13, U.S. Code, Section 9, provides the mandate for the bureau to not “use the information furnished under the provisions of this title for any purpose other than the statistical purposes for which it is supplied; or make any publication whereby the data furnished by any particular establishment or individual under this title can be identified; or permit anyone other than the sworn officers and employees of the Department or bureau or agency thereof to examine the individual reports (13 U.S.C. § 9 (2007)).”
The dual requirement for an accurate count and the protection of respondents and their data creates a natural tension: The more accurate (and therefore usable) the reported data is, the easier it may be to identify individual responses. And yet, as the raw data is altered before being reported (to protect confidentiality), the less usable the publicly released data is.
The bureau has provided a history of how it has handled this dual requirement in “Disclosure Avoidance Techniques Used for the 1960 Through 2010 Census.” The bureau has also created an infographic with this information, “A History of Privacy Protections.”
Due to Privacy Concerns, Reported Data Has Always Been Different from Raw Data
Since 2000, the bureau has used “data swapping” between census blocks as its main disclosure avoidance technique. (The census block is the smallest unit of geography maintained by the bureau.)
As a hypothetical example, consider a census block with just 20 people in it, including one Filipino American. Without any disclosure avoidance effort, it might be possible to figure out the identity of that individual. With data swapping, the Filipino American’s data might be swapped with that of an Anglo American from a nearby census block—a census block where other Filipino Americans reside. The details for the person would be aggregated with others, and therefore not identifiable, and yet the total population in both census blocks would remain accurate.
Big Data Creates the Need for Greater Privacy Measures
Since the release of the 2010 census, bureau staff have realized that data analysts could take the many data products the bureau produces and cross-reference them with each other or with outside data sources to the point that individual privacy, or confidentiality, could be compromised. (This is possible now, as opposed to earlier decades, because of greater computing power and the growth of other databases, such as those used by commercial data vendors.)
There is no evidence that confidentiality has been compromised so far, but that doesn’t change the theoretical possibility that it could happen.
Because of that possibility, in the 2010s the bureau reviewed disclosure avoidance methods that could replace the current data swapping method. Differential privacy has been selected, and is described by the bureau at this webpage, which includes links to many presentations and papers on how differential privacy works.
Although the decision to move to differential privacy was made in 2018, the parameters that guide this new disclosure avoidance method were made in June 2021. The Data Stewardship and Executive Policy Committee (DSEP) announced it has selected the settings and parameters for the Disclosure Avoidance System (DAS) for the 2020 Census redistricting data (PL-94-171). The approved DAS production settings reflect a total privacy-loss budget for the redistricting data product (represented by “ε,” the Greek letter “epsilon”) of ε=19.61, which includes ε=17.14 for the persons file and ε=2.47 for the housing unit data. (A privacy-loss budget sets the balance between data accuracy and privacy loss.). For more information, here is the bureau’s press release and newsletter.
Note that for decades the bureau has not reported raw data; it has used imputation to assign people and characteristics when the enumeration process was not able to obtain this information, and data swapping has been used for two decades as the bureau's method for disclosure avoidance. And, the census has always had undercounts and overcounts in different areas and for different populations.)
With differential privacy, the bureau has stated that the total population in each state will be “as enumerated,” but that all other levels of geography—including congressional districts down to townships and census blocks—could have some variance from the raw data. This is referred to by the Census Bureau as “injecting noise” into the data. The bureau has indicated that no “noise” will be injected into the state total population, but it is likely that noise will be injected for every other level of geography.
Final decisions about the mathematical model used for differential privacy, and therefore the impact on reported data, have yet to be made. On one extreme, to have zero risk of privacy disclosure, all totals reported would have to have some “noise” injected (or some variation from the actual count). On the other extreme, if there were no noise injected, the risk of privacy disclosure would be great. These two variables—risk of disclosure and accuracy—can be measured against each other and, in fact, create a trade-off. The bureau refers to this as a “privacy loss budget.”
The bureau’s proposal at the time of the creation of the 2010 Demonstration Data Products indicated that three data points will be kept “invariant,” or, in other words, won't be altered with differential privacy: total state population, as mentioned above; the total number of housing units in each census block and the number and type of each group quarters unit in each census block are also to be kept invariant. In 2010 and previous decades, all these were kept “invariant” along with most data at the census block level, with the exception of race. All other data, including total population numbers for lower geographic units and demographic characteristics, will vary to some extent this decade.
Differential privacy will mean that, except at the state level, population and voting age population will not be reported as enumerated. And, race and ethnicity data are likely to be farther from the “as enumerated” data than in past decades, when data swapping was used to protect small populations. (In 2010, at the block level, total population, total housing units, occupancy status, group quarters count and group quarters type were all held invariant.) This may raise issues for racial block voting analyses.
While differential privacy is intended to protect confidentiality for respondents, it has implications for smaller subpopulations. For instance, the National Congress of American Indians notes, “The implementation of differential privacy could introduce substantial amounts of noise into statistics for small populations living in remote areas, potentially diminishing the quality of statistics about tribal nations.”
Because of usability concerns, the bureau in October 2019 released 2010 Demonstration Data Products, which provide 2010 raw data treated with the new differential privacy method. Thus, data treated with the differential privacy method of disclosure avoidance can be compared with the 2010 released data (which had been treated with data swapping, the 2010 disclosure avoidance method). The Census Bureau release additional demonstration data in May 2020, September 2020, and November 2020. The Disclosure Avoidance System team plans to release the next Privacy Protected Microdata Files (PPMF) and Detailed Summary Metrics no later than April 30, 2021.
One question for redistricters is whether the reported data from the 2010 Demonstration Data Products is so different from the 2010 data reported by the Census Bureau that it impacts redistricting. This data is available for anyone to use, and the bureau welcomes feedback on this and other questions.
From analyses done by the bureau in conjunction with the National Academy of Sciences Committee on National Statistics, and by outside data users, a few issues have surfaced. The Census Bureau is aware of these issues and is working to address them.
Pursuant to the Freedom of Information Act, I hereby request the following records:
1) Copies of all Data Stewardship Executive Policy Committee (DSEP) agendas and meeting minutes since May 1, 2020.
2) Copies of all Privacy Policy and Research Committee (PPRC) agendas and meeting minutes since May 1, 2020.
I ask that all fees be waived as I am a working journalist and intend to use the requested records to publish articles in the public interest. In the event that you choose to impose fees, I request a detailed breakdown of the costs, including the hourly wages of any personnel assigned to process the request and estimates of how many hours the request will take them to process.
Should you choose to deny access to or redact records responsive to this request, I ask that you provide a detailed explanation of your reasoning--as required by law--citing the specific statutes and case law on which each decision is based.
Thank you very much for your help, and if you have any questions regarding my request please do not hesitate to contact me for clarification.