• 首页期刊简介编委会刊物订阅专栏专刊电子刊学术动态联系我们English
引用本文:牛露露,刘滔滔,黄天敏,刘勇军,罗艺林,陈鑫,廖雨,何建松,朱冬兰.评价DeepSeek与ChatGPT作为咨询工具在真实世界中回答药物-药物相互作用和超说明书用药的能力[J].中国现代应用药学,2025,42(17):107-113.
NIU LULU,LIU TAOTAO,HUANG TIAN MIN,LIU YONG JUN,LUO YI LIN,CHEN XIN,LIAO YU,HE JIAN SOMG,ZHU DONG LAN.To evaluate the ability of DeepSeek and ChatGPT as consulting tools to answer drug-drug interactions and out-of-specification medication in the real world.[J].Chin J Mod Appl Pharm(中国现代应用药学),2025,42(17):107-113.
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 102次   下载 76 本文二维码信息
码上扫一扫!
分享到: 微信 更多
评价DeepSeek与ChatGPT作为咨询工具在真实世界中回答药物-药物相互作用和超说明书用药的能力
牛露露, 刘滔滔, 黄天敏, 刘勇军, 罗艺林, 陈鑫, 廖雨, 何建松, 朱冬兰
广西医科大学第一附属医院
摘要:
目的:大型语言模型的出现为药学领域带来新机遇,然而,在没有充分评估其作为药物咨询工具的适用性就应用于临床,则存在风险。本研究旨在评估 DeepSeek 与 ChatGPT 对真实世界药物咨询问题的应答能力,重点分析其在药物相互作用(drug-drug interactions,DDIs)、超说明书用药(off-label use,OLDU)和治疗方案推荐中的准确性、完整性与适宜性。方法:在常规的药学门诊工作中收集患者提出的有关DDIs和OLDU的86个问题。以标准格式,包括患者的年龄、性别、药物名称和疾病诊断分别向DeepSeek与ChatGPT提问;问题以拟人化形式“作为一名药剂师,你能告诉我治疗方案中存在的问题吗”结束。此外,为了进一步评估DeepSeek与ChatGPT各自回答的一致性,相同的问题以不同的方式提出。所有回答均由两位临床药师进行独立评估,重点评价回答的准确性、一致性和适宜性。结果:DeepSeek和ChatGPT回答了所有问题,临床药师对回答进行评估。其中,DeepSeek在回答治疗方案问题时正确率最高为80.49%,ChatGPT在OLDU领域表现最好,正确率为84.44%。不同提问方式对两款模型DDIs回答正确率有影响;第一种提问方式下,DeepSeek和ChatGPT正确率分别为12.26%和41.46%;第二种提问方式下,DeepSeek正确率 24.39%,而ChatGPT为21.95%,两种提问方式回答的一致性较差。最后对两者回答错误原因进行分析,发现DeepSeek错误类型主要为回答中存在部分错误信息(64.52%),而ChatGPT的错误则集中在推荐了不合适的给药剂量(62.96%)。结论:在药物咨询中尽管DeepSeek和ChatGPT分别在某些领域具有较高的准确性,但它们回答的一致性和敏感性较差。因此直接将二者作为临床药物咨询工具在目前现实情景中使用存在重大风险。
关键词:  DeepSeek  ChatGPT  超说明书用药  药物相互作用  大型语言模型
DOI:
分类号:
基金项目:
To evaluate the ability of DeepSeek and ChatGPT as consulting tools to answer drug-drug interactions and out-of-specification medication in the real world.
NIU LULU, LIU TAOTAO, HUANG TIAN MIN, LIU YONG JUN, LUO YI LIN, CHEN XIN, LIAO YU, HE JIAN SOMG, ZHU DONG LAN
The First Affiliated Hospital of Guangxi Medical University
Abstract:
Objective: The emergence of large language models (LLMs) has brought new opportunities to the pharmaceutical field. However, deploying them as medication consultation tools in clinical settings without thoroughly evaluating their applicability poses significant risks. This study aims to assess the responsiveness of DeepSeek and ChatGPT to real-world medication consultation queries, focusing specifically on their accuracy, comprehensiveness, and appropriateness in addressing drug-drug interactions (DDIs), off-label use (OLDU), and therapeutic recommendations. Methods: We collected 86 clinically validated drug consultation questions from routine pharmacy outpatient services, encompassing DDIs and OLDU scenarios. Each query was formatted with standardized patient parameters (age, gender, medications, diagnosis) and concluded with the prompt: "As a pharmacist, can you identify potential issues in this treatment plan?" To assess response consistency, we employed varied phrasing for identical clinical questions. Two independent clinical pharmacists evaluated answer accuracy, information completeness, therapeutic appropriateness, and inter-rater consistency. Results: Both models demonstrated 100% response rate. DeepSeek achieved 80.49% accuracy in treatment recommendations, while ChatGPT showed superior performance in OLDU (84.44% accuracy). Response consistency proved problematic: initial prompting yielded correct rates of 12.26% (DeepSeek) vs. 41.46% (ChatGPT), with secondary prompting showing 24.39% (DeepSeek) and 21.95% (ChatGPT) accuracy respectively. Error analysis revealed distinct failure patterns: 64.52% of DeepSeek errors involved factual inaccuracies, whereas ChatGPT primarily erred in dosage recommendations (62.96% of errors). Conclusion: While demonstrating partial competence in specific pharmaceutical domains, both LLMs exhibited concerning inconsistencies and context sensitivity in clinical problem-solving. The substantial error rates and response variability highlight significant risks in deploying these models as standalone clinical decision support tools without rigorous validation and human oversight
Key words:  DeepSeek  ChatGPT  Off-label drug use  Drug-drug interaction  large language model  
扫一扫关注本刊微信