Large Language Models in Medicine: Capabilities, Limitations, and the Path Forward

Authors: Thirunavukarasu, A.J., Ting, D.S.J., Elangovan, K., et al. Year: 2023 Journal: Nature Medicine Relevance: foundational DOI: https://doi.org/10.1038/s41591-023-02448-8

clinical-reasoninglimitationsbias

What They Studied

Thirunavukarasu and colleagues conducted a comprehensive review of the state of large language models in medicine, examining their demonstrated capabilities across clinical tasks, their known limitations, and the regulatory and ethical considerations surrounding their deployment. The review synthesized evidence from across healthcare disciplines to map where LLMs show genuine promise and where caution is warranted.

What They Found

LLMs demonstrated strong performance on medical knowledge assessments, with models like GPT-4 passing USMLE-style examinations and performing at or above average physician level on many knowledge-based tasks.
Clinical reasoning capabilities were inconsistent: models could generate plausible-sounding explanations but sometimes arrived at conclusions through pattern matching rather than genuine clinical logic.
Significant bias risks were identified, including underrepresentation of minority populations in training data and the potential to perpetuate existing healthcare disparities.
LLMs showed particular promise in administrative and documentation tasks, where the stakes of minor errors are lower than in direct diagnostic decision-making.
The review emphasized that regulatory frameworks for AI in medicine lag significantly behind the pace of development and clinical adoption.

Methodology

This was a narrative review synthesizing published literature, preprints, and technical reports on LLMs in healthcare through 2023. The authors drew from clinical validation studies, benchmark evaluations, and policy analyses. As a review rather than an original study, it reflects the authors’ interpretation of a rapidly evolving evidence base.

What This Means for SLPs

This paper provides the foundational knowledge for understanding what these tools realistically can and cannot do. Essential reading before adopting any AI tool.
The finding that LLMs excel at documentation and administrative tasks more than clinical reasoning aligns with the recommended “copilot” approach for SLP practice.
Bias risks are real and relevant. SLPs working with linguistically diverse populations need to be especially cautious about AI-generated recommendations that may reflect training data biases.
The gap between AI capability and regulatory oversight means that individual clinicians currently bear significant responsibility for vetting AI outputs.
Understanding LLM limitations helps SLPs set appropriate expectations when institutions push for AI adoption.

Limitations to Keep in Mind

As a 2023 review, some findings may not reflect the capabilities of more recent models, which continue to evolve rapidly.
The review focused broadly on medicine. Speech-language pathology was not specifically addressed, and SLP-specific evidence remains limited.
Much of the reviewed literature came from benchmark tests rather than real-world clinical deployment studies.

The Bottom Line

LLMs show genuine promise for healthcare documentation and knowledge tasks, but their clinical reasoning is inconsistent and their bias risks are real, making informed, cautious adoption essential.