LLM Clinical Reasoning at or Above Practitioner Level
#1Since 2022, GPT-4, Med-PaLM 2, and subsequent models have achieved passing scores on the USMLE (physician licensing exam) with GPT-4 scoring approximately 86–90% on Step 3 in multiple independent evaluations. More directly relevant to this SOC, LLMs perform at or above naturopathic physician-level on nutrition, botanical medicine, and integrative health knowledge benchmarks — domains that constitute the core scope-of-practice differentiation for NDs. A 2024 JAMA study found that LLM-generated responses to patient health questions were rated as higher quality and more empathetic than physician responses by blinded evaluators. Clinical AI systems like Diagnostic Robotics and Isabel DDx are already deployed in health systems to generate differential diagnoses, moving from decision support to decision generation.