基于深度学习的电子病历中医疗知识抽取与分析

doi:10.24920/003589

Highlight

Based on the research status of deep learning, the paper discussed and built two application scenes of bi-directional long short-term memory combined conditional random field (BiLSTM-CRF) model in NER and MRE tasks. Validation on the I2B2 2010 public dataset showed better performance than the baseline methods in the two task.

Abstract

Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR), which are the important digital carriers for recording medical activities of patients. Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE. This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks. In the data preprocessing of both tasks, a GloVe word embedding model was used to vectorize words. In the NER task, a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer. In the MRE task, the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset, the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks, where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task. Moreover, the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction. It also verified the feasibility of the BiLSTM-CRF model in different application scenarios, laying the foundation for the subsequent work in the EMR field.

Key words: medical knowledge extraction, electronic medical record, named entity recognition, medical relation extraction, deep learning, bidirectional long short-term memory, conditional random field

Funding: Supported by the Zhejiang Provincial Natural Science Foundation((No.LQ16H180004))

Li Peilin, Yuan Zhenming, Tu Wenbo, Yu Kai, Lu Dongxin. Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning[J].Chinese Medical Sciences Journal, 2019, 34(2): 133-139.

Figures/Tables 8

Figure 1.

Table 1

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Table 2

Table 3

References 20

[1]	Wu JW, Guan Y, Lv XB . Entity relation extraction from electronic medical records based on deep learning. Intell comput appl 2014; 4(3):35-8. doi: . doi: 10.3969/j.issn.2095-2163.2014.03.009
[2]	Chen L, Li Y, Chen W , et al. Utilizing Soft Constraints to Enhance Medical Relation Extraction from the History of Present Illness in Electronic Medical Records. J Biomed Inform 2018; 87:108-17. doi: . doi: 10.1016/j.jbi.2018.09.013
[3]	Xue NW, Shen LB . Chinese word segmentation as LMR tagging. Proceedings of the second SIGHAN workshop on Chinese language processing. 2003, Jul 11-12; Sapporo, Japan. Stroudsburg, PA, USA: Association for computation Linguistics; 2003. 17:176-9. doi: . doi: 10.3115/1119250.1119278
[4]	Finkel JR, Grenager T, Manning C . Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd annual meeting on association for computational linguistics 2005; 363-70. doi: doi: 10.3115/1219840.1219885
[5]	Ye F, Chen W, Zhou GG , et al. Intelligent recognition of named entities in electronic medical records. Chin J of Biomed Eng 2011; 30(2):256-62. Chinese. doi: . doi: 10.3969/j.issn.0258-8021.2011.02.014
[6]	Li W, Zhao DZ, Li B , et al. Entity recognition of medical records combined with CRF and rules. App Res Comput 2015; 32(4):1082-6. doi: . doi: 10.3969/j.issn.1001-3695.2015.04.029
[7]	Bollegala D, Matsuo Y, Ishizuka M. Relation adaptation: learning to extract novel relations with minimum supervision. Proceedings of the 22nd International Joint Conference of Artificial Intelligence (IJCAI). 2011 Jul 16-22; Barcelona, Spain. Menlo Park, California, USA: AAAI Press; 2011. p. 2205-10. doi: doi:. doi: 10.5591/978-1-57735-516-8/IJCAI11-368
[8]	Suchanek FM, Ifrim G, Weikum G . Combining linguistic and statistical analysis to extract relations from web documents. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006 Aug 20-23; Philadelphia, USA. New York: ACM; 2006. p. 712-7. doi: . doi: 10.1145/1150402.1150492
[9]	Qin B, Liu AA , Liu T. Unsupervised Chinese Open Entity Relation Extraction. J Comput Res Dev 2015; 52(5):1029-35. Chinese. doi:. doi: 10.7544/issn1000-1239.2015.20131550
[10]	Uzuner O, Mailoa J, Ryan R , et al. Semantic relations for problem-oriented medical records. Artif Intell Med 2010; 50(2):63-73. doi: doi: 10.1016/j.artmed.2010.05.006
[11]	Zhou G, Su J, Zhang J , et al. Exploring various knowledge in relation extraction. ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005 Jun 25-30; Michigan, USA. Stroudsburg, PA, USA: Association for computational Linguistics; 2005. p. 427-34. doi: . doi: 10.3115/1219840.1219893
[12]	Uzuner O, South BR, Shen S , et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assn 2011; 18(5):552-6. doi: doi: 10.1136/amiajnl-2011-000203
[13]	Rink B, Harabagiu S, Roberts K . Automatic extraction of relations between medical concepts in clinical texts. J Am Med Inform Assn 2011; 18(5):594-600. doi: doi: 10.1136/amiajnl-2011-000153
[14]	Demner-Fushman D, Mork JG, Shooshan SE , et al. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform 2010; 43(4):587-94. doi: doi: 10.1016/j.jbi.2010.02.005
[15]	Lv X, Yi G, Yang J , et al. Clinical relation extraction with deep learning. Int J Inf Tech Decis 2016; 9(7):237-48. doi: . doi: 10.14257/ijhit.2016.9.7.22
[16]	Pennington J, Socher R , Manning C. Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014 Oct 25-29; Doha, Qatar. p. 1532-43. doi: . doi: 10.3115/v1/D14-1162
[17]	Yin W, Kann K, Yu M , et al. Comparative study of CNN and RNN for natural language Processing 2017; arXiv: 1702. 01923.
[18]	Yang JF, Yu QB, Guan Y , et al. A Review of research on name recognition and entity relation extraction of electronic medical records. Acta Automatica Sinica 2014; 40(8):1537-62. doi: . doi: 10.3724/SP.J.1004.2014.01537
[19]	Jiang M, Chen Y, Liu M , et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assn 2011; 18(5):601-6. doi: doi: 10.1136/amiajnl-2011-000163
[20]	De Bruijn B, Cherry C, Kiritchenko S , et al. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assn 2011; 18(5):557-62. doi: doi: 10.1136/amiajnl-2011-000150

Category	Description
TrIP	Treatment improves medical problems.
TrWP	Treatment worsens medical problems.
TrCP	Treatment causes medical problems.
TrAP	Treatment is applied to medical problems.
TrNAP	Treatment is not applied to medical problems.
TeRP	Tests reveal medical problems.
TeCP	In order to prove medical problems, need to be checked.
PIP	The relation between medical problems.

Models	Medical Problems	Treatment	Tests	Total
SVM	0.861	0.829	0.785	0.832
CRF	0.878	0.845	0.792	0.847
HMM ^[20]	0.875	0.851	0.804	0.852
LSTM	0.892	0.863	0.816	0.861
BiLSTM-CRF	0.902	0.896	0.832	0.879

Models	TrIP	TrWP	TrCP	TrAP	TrNAP	TeRP	TeCP	PIP	Total
SVM	0.23	0.05	0.496	0.806	0.17	0.872	0.45	0.87	0.737
ME	0.216	0.02	0.502	0.814	0.193	0.859	0.393	0.91	0.731
DNN+CRF^[1]	0.225	0.03	0.534	0.86	0.225	0.916	0.451	0.96	0.752
BiLSTM-CRF	0.251	0.11	0.572	0.903	0.35	0.931	0.503	0.98	0.775

Early edition

Current issue

Archive

Specialties & Topics

Author center

Reviewer center

Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning

RichHTML

PDF (PC)

Knowledge

Highlight

Abstract

Cite this article

share this article

Figures/Tables 8

References 20

Related Articles 5

Metrics

Recommended 10

[1]	Wei Ba, Shuhao Wang, Cancheng Liu, Yuefeng Wang, Huaiyin Shi, Zhigang Song. Histopathological Diagnosis System for Gastritis Using Deep Learning Algorithm [J]. Chinese Medical Sciences Journal, 2021, 36(3): 204-209.
[2]	Lianyan Xu, Ke Yan, Le Lu, Weihong Zhang, Xu Chen, Xiaofei Huo, Jingjing Lu. External and Internal Validation of a Computer Assisted Diagnostic Model for Detecting Multi-Organ Mass Lesions in CT images [J]. Chinese Medical Sciences Journal, 2021, 36(3): 210-217.
[3]	Jiazheng Li, Lei Tang. Radiomics in Antineoplastic Agents Development: Application and Challenge in Response Evaluation [J]. Chinese Medical Sciences Journal, 2021, 36(3): 187-195.
[4]	Dasheng Li,Dawei Wang,Nana Wang,Haiwang Xu,He Huang,Jianping Dong,Chen Xia. An Insight of the First Community Infected COVID-19 Patient in Beijing by Imported Case: Role of Deep Learning-Assisted CT Diagnosis [J]. Chinese Medical Sciences Journal, 2021, 36(1): 66-71.
[5]	Shi Ying-huan,Wang Qian. The Artificial Intelligence-Enabled Medical Imaging: Today and Its Future [J]. Chinese Medical Sciences Journal, 2019, 34(2): 71-75.