Social Sciences

Prompting for Information Extraction (Problem Set)

Class Semester Instructor Department License
AI for Social Science Methods Fall 2025 Daniel Karell Sociology CC BY-NC-SA 4.0

Learning Objectives

  1. Students gain hands-on familiarity with prompting techniques to use with LLMs. 
  2. Students gain experience utilizing LLMs to label observations in a dataset and extract information from text.
  3. Students explore how to evaluate the output of LLMs during a labeling and information-extraction task using human-coded “ground truth” observations.

Overview (see attached for complete instructions)

The goals of this activity are to gain (1) hands-on familiarity with prompting techniques (some of which we have read about) and (2) experience utilizing large language models (LLMs) to label observations in a dataset and extract information from text, which are common tasks in social science research. We will be using a dataset provided by Armed Conflict Location and Event Data (ACLED). ACLED is a non-profit organization that collects data on violent conflict and protests around the world. It organizes and publishes these data, along with a codebook, or a guide explaining the data. For the exercises below, we will utilize both a sample of ACELD’s data and its codebook. When working through the example code in the following section(s), you do not need to submit answers to any questions in the text. These questions are meant to help you reflect on what is happening in the example analysis. Try to answer them to yourself to check your understanding. You only need to submit answers to the questions and prompts in the “Exercises” section below.

Reflections

Students frequently encountered usage limits imposed by their (free) accounts with the LLM provider (e.g., Mistral, Anthropic), which delayed their progress on the assignment and caused frustration. Ideally, the students could use an API to access Yale’s “Clarity” work environment, but such an API does not exist. An alternative would be for students to set up LLMs on their local machines, but this solution would require assuming that all students have access to non-trivial computing resources.

Readings and Resources

1. “Finetuned Language Models Are Zero-Shot Learners” by Wei, et al. (2022)

2. “Super-Natural Instructions: Generalization via Declarative Instructions on 1600+ NLP Tasks” by Wang, et al. (2022)

3. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” by Wei, et al. on arXiv (2023) 

4. “From Codebooks to Promptbooks: Extracting Information from Text with Generative Large Language Models” by Stuhler, et al. in Sociological Methods & Research (2025)

5. “Stay Tuned: Improving Sentiment Analysis and Stance Detection Using Large Language Models” by Griswold, et al. in Political Analysis (2025)

6. “Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning”, by Chae and Davidson in Sociological Methods & Research (2025)

7. “Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts” by Halterman and Keith in Political Analysis (2025)

8. “Using Large Language Models for Qualitative Analysis can Introduce Serious Bias” by Ashwin, et al. in Sociological Methods & Research (2025)

Looking for More AI Assignment Ideas?

Check out our Assignment Case Studies Catalog

Assignment Case Studies