Frontier LLMs at expert parity on core analytical writing tasks
#1Multiple independent evaluations published between 2023 and 2025 have demonstrated that GPT-4o, Claude 3.5/3.7, and Gemini 1.5 Pro perform at or above the median PhD-level analyst on core political science writing tasks including literature synthesis, policy brief drafting, and comparative case summaries. TΓΆrnberg (2023) showed GPT-4 outperformed crowd workers and matched expert coders on political text classification. Argyle et al. (2023) demonstrated LLMs could simulate ideologically diverse survey respondents. The capability curve is still ascending: each model generation reduces the quality gap that previously protected human analysts.