Chinese Medical Sciences Journal ›› 2019, Vol. 34 ›› Issue (2): 133-139.doi: 10.24920/003589

Special Issue: 医学人工智能

• Original Article • Previous Articles     Next Articles

Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning

Li Peilin, Yuan Zhenming, Tu Wenbo, Yu Kai, Lu Dongxin   

  1. Engineering Research Center of Mobile Health Management, Ministry of Education, Hangzhou Normal University, Hangzhou 311121, China
  • Received:2019-03-29 Accepted:2019-04-24 Published:2019-06-30 Online:2019-05-14
Based on the research status of deep learning, the paper discussed and built two application scenes of bi-directional long short-term memory combined conditional random field (BiLSTM-CRF) model in NER and MRE tasks. Validation on the I2B2 2010 public dataset showed better performance than the baseline methods in the two task.

Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR), which are the important digital carriers for recording medical activities of patients. Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE. This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks. In the data preprocessing of both tasks, a GloVe word embedding model was used to vectorize words. In the NER task, a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer. In the MRE task, the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset, the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks, where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task. Moreover, the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction. It also verified the feasibility of the BiLSTM-CRF model in different application scenarios, laying the foundation for the subsequent work in the EMR field.

Key words: medical knowledge extraction, electronic medical record, named entity recognition, medical relation extraction, deep learning, bidirectional long short-term memory, conditional random field

Funding: Supported by the Zhejiang Provincial Natural Science Foundation((No.LQ16H180004))

Copyright © 2021 Chinese Academy of Medical Sciences.  京公安备110402430088  京ICP备06002729号-1  Powered by Magtech.

Supervised by National Health Commission of the People's Republic of China

9 Dongdan Santiao, Dongcheng district, Beijing, 100730 China

Tel: 86-10-65105897  Fax:86-10-65133074 

E-mail: cmsj@cams.cn  www.cmsj.cams.cn

Copyright © 2018 Chinese Academy of Medical Sciences

All right reserved.

京公安备110402430088  京ICP备06002729号-1