We will learn how to transform unstructured text into usable signals, evaluate models in applied settings, and connect these tools to questions in the social sciences and policy.
The course covers a range of methods—from simple dictionary-based approaches to supervised models and modern LLM-based techniques—highlighting their strengths, limitations, and trade-offs. A central focus is on building text-based indicators to nowcast real-world events and forecast risks in applied contexts.
Particular emphasis is placed on fine-tuning, model evaluation, and threshold selection, with decisions guided by policy-relevant trade-offs (e.g., false positives vs. missed events). We also discuss why frequency mismatches in data matter, and introduce mixed-frequency methods to better integrate information from different sources.
By the end of the course, students will be able to construct text-based indicators using state-of-the-art methods that capture semantic and contextual information, and deploy them in decision-oriented applications.