AI & Imaging

LLMs may help draft radiology claim appeal letters

An Academic Radiology pilot study found that LLMs could generate useful appeal-letter templates for denied interventional radiology services, but hallucinations and fabricated references remained concerns.

LLMs may help draft radiology claim appeal letters
LLMs may help draft radiology claim appeal letters

Large language models may help radiology teams draft appeal letters for denied insurance coverage, according to a pilot study published in Academic Radiology.

The study evaluated whether LLMs could generate accurate, clinically valid, and usable letters for appealing insurance denials related to interventional radiology services. The work focused on a common administrative burden in imaging practices: preparing payer-facing appeals when requested procedures are not approved. 

Researchers tested 4 LLMs: Claude 3.5, Nova Pro, Llama-3.1-70B, and ChatGPT-4o. The models were prompted to generate appeal letters for simulated clinical scenarios using 3 techniques: zero-shot prompting, few-shot prompting, and retrieval-augmented generation.

A total of 12 appeal letters were generated and reviewed by 4 board-certified interventional radiologists. Reviewers were blinded to the model and prompting method used for each letter. They assessed content, grammar, structure, and usability, while references cited by the models were checked for accuracy.

Across the models, mean content scores were 3.9 out of 5, while mean grammar and structure scores were 4.3 out of 5. The letters were generally viewed as readable and usable, though reviewer agreement varied across scoring categories.

Usability was one of the stronger findings. Reviewers indicated that the LLM-generated letters would serve as helpful templates in 73% of cases. That suggests the tools may have near-term value as drafting aids rather than autonomous appeal systems.

Safety and accuracy concerns remained. Hallucinations were identified in 16 of 48 letters. ChatGPT-4o was more vulnerable to hallucinations than the offline models in the study, according to the reported results.

Reference accuracy was also a limitation. Of 44 references cited across the generated letters, 80% of references from offline models were fabricated. By comparison, 29% of ChatGPT-4o-generated letters contained fabricated references.

The findings point to a practical but limited role for LLMs in radiology administration. These tools may reduce the time needed to prepare first drafts, but the outputs still require human review before submission to insurers.

The authors concluded that generative AI may help reduce administrative burden related to prior authorizations or denials, but careful oversight remains necessary. That caution is especially relevant when letters include clinical reasoning, literature references, or payer-facing statements that could affect patient access to care.

The study adds to a growing body of work examining LLMs in radiology beyond image interpretation, including documentation, report summarization, patient communication, and workflow support.

large language modelsAcademic Radiologyinterventional radiologyinsurance denialsprior authorizationradiology claimsappeal lettersgenerative AIradiology administrationhealthcare reimbursement
Share

About the author

RadiologySignal.com writers

Editorial Team

Radiology Signal Staff covers developments across medical imaging, radiology AI, imaging informatics, clinical research, and radiology business. The team monitors primary sources, peer-reviewed studies, company announcements, society updates, and healthcare industry news to deliver concise reporting for imaging professionals.