We are creating a new data set of employer-sponsored retirement plans by codifying the information contained in the auditors reports that all US firms with more than 100 employees are obliged to submit. Many features of this important benefit (e.g. match rates, non-elective contributions, vesting schedules and loan-availability) are contained only as a narrative description, buried among lots of other text. Our long term goal is to match 20 years of firm-level plan data to employer-level data to study a number of questions regarding how employees make their savings decisions and how firms structure their retirement plans.
The task that we would like an RA to help with is to develop a code base (in Python) to help us find and extract in these documents the relevant pages with the information we seek.
Requisite Skills and Qualifications:
Python skills are essential. Experience with OCR is desirable.