Using LLMs: Text Data and Children's Aspirations
I am working with a unique dataset of 10,000 essays written by children in 1969 describing how they imagined their future. These essays are linked to more than 50 years of follow-up data on the same individuals, now in their 60s. The goal of the project is to explore the relationship between early-life aspirations and later-life outcomes.
The RA will work with local large language models (LLMs) running on the Yale Cluster to analyze and categorize the content of these essays. Responsibilities will include:
- iIteratively developing and refining prompts for LLMs to improve classification and extraction accuracy.
- Evaluating and cleaning model outputs for consistency and reliability.
- Assisting in the organization, documentation, and preliminary analysis of the categorized data.
- Collaborating with the research team to troubleshoot model performance and suggest enhancements.
Requisite Skills and Qualifications
Essential: Strong proficiency in Python and a solid computer science background (preferably CPSC 223 or higher).
Preferred but not essential: Experience with running code on the Yale Cluster (or other HPC environments).
Familiarity with machine learning or natural language processing concepts is a plus.