Topological Data Analysis (TDA) of firms’ patent portfolios and financial performances

Closed to further applications
Faculty Member: 
This project is eligible for remote work.

Proposal Description:

The rate and direction of innovation is one of the central topics in economics for 60 years, but the “direction” part of the question has rarely been studied despite its potential importance. This project seeks to describe and characterize the dynamic evolution of firms’ patent portfolios and financial performances by adapting a new tool from computational topology (the Mapper algorithm, which translates complex data into an estimate of the Reeb graph), which is a frontier method in applied mathematics and the analysis of high-dimensional data.

Requisite Skills and Qualifications:

The required skills are (1) proficiency in handling Excel spreadsheets, (2) passion and patience in handling real-world data [even a commercial-grade financial database like COMPUSTAT is pretty crazy and disorganized, and it’s our job to make it useful for rigorous academic research], and (3) willingness to communicate and cooperate with me and my collaborator (an economics Ph.D student at Yale). Advanced economics, econometrics, and/or data-analysis skills are a plus but not explicitly required.

Note to students: 

Don’t worry: You don’t have to be a mathematician/topologist to assist this project. The first phase of this project (or any major empirical analysis) is data collection and cleaning. Specifically, I would ask you to obtain access to the COMPUSTAT database (via Yale Library), which is one of the most prominent data sets for publicly listed firms’ financial performances. You are expected to identify, collect/download, and clean/streamline relevant information on the 1,000 major firms of the world for which I have already gathered some patent statistics.

Award: 
  • Clara Penteado
  • Alexis Teh
  • Janie Wu