Using LLMs: Text Data and Children's Aspirations

I am working with a unique dataset of 10,000 essays written by children in 1969 describing how they imagined their future. These essays are linked to more than 50 years of follow-up data on the same individuals, now in their 60s. The goal of the project is to explore the relationship between early-life aspirations and later-life outcomes.

The RA will work with local large language models (LLMs) running on the Yale Cluster to analyze and categorize the content of these essays. Responsibilities will include:

iIteratively developing and refining prompts for LLMs to improve classification and extraction accuracy.
Evaluating and cleaning model outputs for consistency and reliability.
Assisting in the organization, documentation, and preliminary analysis of the categorized data.
Collaborating with the research team to troubleshoot model performance and suggest enhancements.

Requisite Skills and Qualifications

Essential: Strong proficiency in Python and a solid computer science background (preferably CPSC 223 or higher).

Preferred but not essential: Experience with running code on the Yale Cluster (or other HPC environments).

Familiarity with machine learning or natural language processing concepts is a plus.