The US Census Bureau’s County Business Patterns (CBPs) datasets record US employment by industry, county and year. They represent the most detailed view of US employment available, and have been used to answer a variety of questions in different fields of economics. Unfortunately, they have two limitations: employment for many county-industry-year cells are missing due to Census’ efforts to preserve confidentiality; and industry and county classification codes change over time.
This project has four goals. First, to provide as many years of the raw US County Business Patterns (CBPs) data in electronic form as possible. Second, to develop, implement adn refine an algorithm for inferring missing county-industry-year employment. Third, to develop concordances for changes in industry and county classification over the life of the CBP. Fourth, to use the refined data to understand the “domestic offshoring” of US manufacturing employment from the North and East in the 1960s and 1970s to the South and West in the 1980s and 1990s.
Helping us achieve these goals will help prepare students for research using other “big data” in either academic or technology sector settings.
Requisite Skills and Qualifications:
We need help hunting down either electronic or hard-copy versions of the earliest CBPs from 1946-1962, and, in the latter case, converting them into electronic form. We need help refining and speeding up our pilot Python code for imputing missing values in the post-1974 data, and perhaps porting it to C++. We need help refining our county and industry concordances. Finally, we will need help analyzing the data to understand the movement of US manufacturing employment in the second half of the 20th century. The ideal candidates would have an interest and proficiency in one or more of these tasks.
We welcome applications from computer science, statistics and economics majors, as skills from all three fields are useful for different aspects of the project.