Chinese Medical Sciences Journal ›› 2019, Vol. 34 ›› Issue (2): 133-139.doi: 10.24920/003589

所属专题: 医学人工智能

• 研究论文 • 上一篇    下一篇

基于深度学习的电子病历中医疗知识抽取与分析

李培林,袁贞明,涂文博,俞凯,芦东昕   

  1. 杭州师范大学 移动健康管理系统教育部工程研究中心,杭州 311121,中国
  • 收稿日期:2019-03-29 接受日期:2019-04-24 出版日期:2019-06-30 发布日期:2019-05-14

Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning

Li Peilin,Yuan Zhenming,Tu Wenbo,Yu Kai,Lu Dongxin   

  1. Engineering Research Center of Mobile Health Management, Ministry of Education, Hangzhou Normal University, Hangzhou 311121, China
  • Received:2019-03-29 Accepted:2019-04-24 Online:2019-06-30 Published:2019-05-14

摘要: 目的 电子病历(Electronic Medical Record, EMR)是记录患者医疗活动的重要数字载体。医疗知识抽取(Medical knowledge extraction,MKE)在EMR方面的自然语言处理(Natural language processing,NLP)研究中起着关键作用。命名实体识别(Named Entity Recognition, NER)和医疗关系抽取(Medical Relation Extraction, MRE)是MKE的两个基本任务。本研究旨在通过探索新方法来提高这两项任务的识别准确性。方法 本研究讨论并构建了针对NER和MRE任务的双向长短期记忆神经网络组合条件随机场(Bidirectional long short-term memory combined conditional random field, BiLSTM-CRF)模型的两个应用场景。在两个任务的数据预处理中,使用GloVe词嵌入模型来对单词进行矢量化。在NER任务中,我们使用序列标注策略通过CRF层的联合概率分布对每个单词标签进行分类。而在MRE任务中,我们将单个实体的分类问题转换为序列分类问题,并且通过CRF层链接实体之间的特征组合来预测医疗实体的关系类别。结果 通过在I2B2 2010公共数据集上的验证,本研究中构建的BiLSTM-CRF模型较两个任务中的基线方法均取得了更好的结果,其中在NER任务中的F1值约0.88,在MRE任务中的F1值约0.78。此外,本模型的收敛速度更快,也避免了过度拟合等问题。结论 本研究证明了深度学习在医疗知识抽取领域的良好表现,并且验证了BiLSTM-CRF模型在不同应用场景下的可行性,为EMR领域的后续工作奠定了基础。

关键词: 医疗知识抽取, 电子病历, 命名实体识别, 医疗关系抽取, 深度学习, 双向长短期记忆神经网络组合条件随机场

Abstract: Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR), which are the important digital carriers for recording medical activities of patients. Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE. This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks. In the data preprocessing of both tasks, a GloVe word embedding model was used to vectorize words. In the NER task, a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer. In the MRE task, the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset, the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks, where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task. Moreover, the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction. It also verified the feasibility of the BiLSTM-CRF model in different application scenarios, laying the foundation for the subsequent work in the EMR field.

Key words: medical knowledge extraction, electronic medical record, named entity recognition, medical relation extraction, deep learning, bidirectional long short-term memory, conditional random field

基金资助: 浙江省自然科学基金((No.LQ16H180004))

Copyright © 2018 Chinese Academy of Medical Sciences. All right reserved.
 
www.cmsj.cams.cn
京公安备110402430088 京ICP备06002729号-1  Powered by Magtech.

Supervised by National Health & Family Plan Commission of PRC

9 Dongdan Santiao, Dongcheng district, Beijing, 100730 China

Tel: 86-10-65105897  Fax:86-10-65133074 

E-mail: cmsj@cams.cn  www.cmsj.cams.cn

Copyright © 2018 Chinese Academy of Medical Sciences

All right reserved.

京公安备110402430088  京ICP备06002729号-1