Skip to main content
Research Assistants

Using LLMs: Text Data and Children's Aspirations

I am working with a unique dataset of 10,000 essays written by children in 1969 describing how they imagined their future. These essays are linked to more than 50 years of follow-up data on the same individuals, now in their 60s. The goal of the project is to explore the relationship between early-life aspirations and later-life outcomes.

The RA will work with local large language models (LLMs) running on the Yale Cluster to analyze and categorize the content of these essays. Responsibilities will include:

  • iIteratively developing and refining prompts for LLMs to improve classification and extraction accuracy.
  • Evaluating and cleaning model outputs for consistency and reliability.
  • Assisting in the organization, documentation, and preliminary analysis of the categorized data.
  • Collaborating with the research team to troubleshoot model performance and suggest enhancements.

Requisite Skills and Qualifications

Essential: Strong proficiency in Python and a solid computer science background (preferably CPSC 223 or higher).

Preferred but not essential: Experience with running code on the Yale Cluster (or other HPC environments).

Familiarity with machine learning or natural language processing concepts is a plus.