构建大规模心衰临床研究队列：区域卫生平台电子健康档案数据复用的挑战与对策

doi:10.24920/003579

Highlight

For reusing the electronic medical records in regional healthcare platform, the inconsistency of terminology and the complexities in data quality and formats cause great challenges. In this paper, methodology and process on constructing a cohort of heart failure for a large scale clinical research were introduced.

Abstract

Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management. It is a common requirement to reuse the data for clinical research. However, we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform. In this paper, we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology. We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform. Secondly, we built special disease case repositories (i.e., heart failure repository) that utilize the graph to search the related patients and to normalize the data. Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure, we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository. After the propensity score matching, the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired. Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients. This paper presents the workflow and application example of big data mining based on regional EHR data.

Key words: electronic health records, clinical terminology knowledge graph, clinical special disease case repository, evaluation of data quality, large scale cohort study

Funding: Supported by the National Major Scientific and Technological Special Project for "Significant New Drugs Development"(No. 2018ZX09201008); Special Fund Project for Information Development from Shanghai Municipal Commission of Economy and Information (No. 201701013).

Liu Daowen,Lei Liqi,Ruan Tong,He Ping. Constructing Large Scale Cohort for Clinical Study on Heart Failure with Electronic Health Record in Regional Healthcare Platform: Challenges and Strategies in Data Reuse[J].Chinese Medical Sciences Journal, 2019, 34(2): 90-102.

Figures/Tables 10

Figure 1.

Figure 2.

Figure 3.

Table 1

Table 2

Table 3

Table 4

Table 5

Figure 4.

Figure 5.

References 33

[1]	Shah AD, Langenberg C, Rapsomaniki E , et al. Type 2 diabetes and incidence of cardiovascular diseases: a cohort study in 1.9 million people. Lancet Diabetes Endocrinol 2015; 3(2):105-13. doi: doi: 10.1016/S2213-8587(14)70219-0
[2]	Denaxas SC, George J, Herrett E , et al. Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). Int J Epidemiol 2012; 41(6):1625-38. doi: doi: 10.1093/ije/dys188
[3]	Abrah?o MTF, Nobre MRC, Gutierrez MA . A method for cohort selection of cardiovascular disease records from an electronic health record system. Int J Med Inform 2017; 102:138-49. doi: doi: 10.1016/j.ijmedinf.2017.03.015
[4]	Jin B, Che C, Liu Z , et al. Predicting the risk of heart failure with EHR sequential data modeling. IEEE Access 2018; 6:9256-61. doi: doi: 10.1109/ACCESS.2017.2789324
[5]	Lei L, Zhou Y, Zhai J , et al. An effective patient representation learning for time-series prediction tasks based on EHRs. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018 Dec 3-6; Madric, Spain. IEEE; 2018. p 885-92. doi: . doi: 10.1109/bibm.2018.8621542
[6]	Rajkomar A, Oren E, Chen K , et al. Scalable and accurate deep learning with electronic health records. Digital Med 2018; 1(1):18. doi: doi: 10.1038/s41746-018-0029-1
[7]	Donnelly K . SNOMED-CT: the advanced terminology and coding system for eHealth. Stud Health Technol Inform 2006; 121:279-90.
[8]	Mcdonald CJ, Huff SM, Suico JG , et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem 2003; 49(4):624-33. doi: doi: 10.1373/49.4.624
[9]	De Franco E, Flanagan SE, Houghton JA , et al. The effect of early, comprehensive genomic testing on clinical care in neonatal diabetes: an international cohort study. Lancet 2015; 386(9997):957-63. doi: doi: 10.1016/S0140-6736(15)60098-8
[10]	Bashi N, Karunanithi M, Fatehi F , et al. Remote monitoring of patients with heart failure: an overview of systematic reviews. J Med Internet Res 2017; 19(1):e18. doi: doi: 10.2196/jmir.6571
[11]	Kudyba SP. Healthcare informatics: improving efficiency through technology, analytics, and management. Boca Raton, FL, USA: CRC Press; 2016. doi: . doi: 10.1201/b21424-6
[12]	Nakamura M, Wakabayashi G, Miyasaka Y , et al. Multicenter comparative study of laparoscopic and open distal pancreatectomy using propensity score‐matching. J Hepatobiliary Pancreat Sci 2015; 22(10):731-6. doi: doi: 10.1002/jhbp.268
[13]	Ruan T, Wang M, Sun J , et al. An automatic approach for constructing a knowledge base of symptoms in Chinese. J Biomed Semantics 2017; 8(1):33. doi: doi: 10.1186/s13326-017-0145-x
[14]	Qiu J, Wang Q, Zhou Y , et al. Fast and accurate recognition of Chinese clinical named entities with residual dilated convolutions. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018 Dec 3-6; Madrid, Spain. IEEE; 2018. p 935-42. doi: . doi: 10.1109/bibm.2018.8621360
[15]	Xu J, Gan L, Cheng M , et al. Unsupervised medical entity recognition and linking in Chinese online medical text. J Healthc Eng 2018; 2548537. doi: . doi: 10.1155/2018/2548537
[16]	Li Z, Yang Z, Shen C , et al. Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text. BMC Med Inform Decis Mak 2019; 19(Suppl 1):22. doi: doi: 10.1186/s12911-019-0736-9
[17]	Bodenreider O . The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32(suppl 1):D267-70. doi: doi: 10.1093/nar/gkh061
[18]	Lowe HJ, Barnett GO . Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA 1994; 271(14):1103-8. doi: 10.1001/jama.1994.03510380059038
[19]	Sherman RE, Anderson SA, Dal Pan GJ , et al. Real-world evidence—what is it and what can it tell us. N Engl J Med 2016; 375(23):2293-7. doi: doi: 10.1056/NEJMsb1609216
[20]	Samwald M, Jentzsch A, Bouton C , et al. Linked open drug data for pharmaceutical research and development. J Cheminform 2011; 3(1):19. doi: doi: 10.1186/1758-2946-3-19
[21]	Hearst MA . Automatic acquisition of hyponyms from large text corpora. Proceedings of the 14th conference on Computational linguistics. 1992 Aug. 23-28; Nantes, France. Stroudsburg, PA, USA: Association for computational linguistics; 1992. 2:p 539-45. doi: . doi: 10.3115/992133.992154
[22]	Belleau F, Nolin MA, Tourigny N , et al. Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 2008; 41(5):706-16. doi: doi: 10.1016/j.jbi.2008.03.004
[23]	Zhang J, Wang Q, Zhang Z , et al. An effective standardization method for the lab indicators in regional medical health platform using N-grams and stacking. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018 Dec 3-6; Madrid, Spain. IEEE, 2018. p. 1602-9. doi: . doi: 10.1109/bibm.2018.8621274
[24]	Wang Q, Wang T, Xu C . Using a knowledge graph for hypernymy detection between Chinese symptoms. 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI). 2018 Mar 29-31; Xiamen China. IEEE, 2018. p. 601-6. doi: . doi: 10.1109/icaci.2018.8377528
[25]	Wang Q, Xu C, Zhou Y , et al. An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities. 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018 Dec 3-6; Madrid, Spain. IEEE, 2018. p. 1031-5. doi: . doi: 10.1109/bibm.2018.8621408
[26]	Richesson RL, Andrews JE, Krischer JP . Use of SNOMED CT to represent clinical research data: a semantic characterization of data items on case report forms in vasculitis research. J Am Med Inform Assoc 2006; 13(5):536-46. doi: doi: 10.1197/jamia.M2093
[27]	Weiskopf NG, Weng C . Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013; 20(1):144-51. doi: doi: 10.1136/amiajnl-2011-000681
[28]	Ye Q, Zhao L, Ruan T , et al. Usability research of regional health data for clinical efficacy analysis. Big Data Res 2018; 4(3):2018026. Chinese. doi: . doi: 10.11959/j.issn.2096-0271.2018026
[29]	Elwyn G, O’connor A, Stacey D , et al. Developing a quality criteria framework for patient decision aids: online international Delphi consensus process. BMJ 2006; 333(7565):417. doi: 10.1136/bmj.38926.629329.ae. doi: 10.1136/bmj.38926.629329.AE
[30]	Brown BB. Delphi process: a methodology used for the elicitation of opinions of experts. Santa Monica, CA, USA: RAND Corporation; 1968. https://www.rand.org/pubs/papers/P3925.html . Accessed May 16, 2019.
[31]	Bauersachs J, Galuppo P, Fraccarollo D , et al. Improvement of left ventricular remodeling and function by hydroxymethylglutaryl coenzyme a reductase inhibition with cerivastatin in rats with heart failure after myocardial infarction. Circulation 2001; 104(9):982-5. doi: 10.1161/hc3401.095946
[32]	Caliendo M, Kopeinig S . Some practical guidance for the implementation of propensity score matching. J Eco Survey 2008; 22(1):31-72. doi: doi: 10.1007/3-540-28708-6_4
[33]	Lunt M . Selecting an appropriate caliper can be essential for achieving good balance with propensity score matching. Am J Epidemiol 2013; 179(2):226-35. doi: . doi: 10.1093/aje/kwt212

Clustering algorithm	Precision	Recall	F1-score
K-means	37.88	21.31	27.27
Meanshift	34.93	18.85	24.49
GMM	42.17	23.98	30.58
AHC	35.16	20.30	25.74
DBSCAN	27.85	91.36	42.68

Method	Precision	Recall	F1-score
KG Fusion	79.23	73.60	76.32
Diag. Alignment	81.67	74.62	77.98
KB Alignment	87.20	72.59	79.22
Ours	86.84	83.76	85.27

Category	Feature name in CRFs	Feature value
Population information	Age	The age of patient
	Gender	Male or female
	Readmission time	The value of readmission time
	…	…
Outpatient prescription	ACEI/ARB	Take the medicine or not
	β-Blocker	Take the medicine or not
	Diuretic	Take the medicine or not
	Huangqi (黄芪)	Take the medicine or not
	Dangshen (党参)	Take the medicine or not
	…	…
Laboratory test	Serum potassium	The results of serum potassium; normal range: 3.5-5.5mmol/L
	Serum sodium	The results of serum sodium; normal range: 135-145mmol/L
	Serum creatinine	The results of serum creatinine; normal range: 20-110μmol/L
	…	…
First page of medical record	Heart function level	Heart function level I; heart function level II; heart function level III; or heart function level IV
	Diabetes	Suffer or not
	Hypertension	Suffer or not
	…	…

Data source table	Feature name in source table	Preprocessing rules	Name of target feature	Value of target feature
Patient information table
	Birth date; Hospitalization date	Hospitalization date minus birth date	Age	Age value
	Gender	Numerical mapping	Gender	1: Male; 2: Female
	Discharge date; Next admission date	Next admission date minus discharge date	Readmission time	The value of readmission time
	…	…	…	…
Outpatient prescription table; inpatient medical order table
	Item detail name	“outpatient prescription table” records the outpatient medication; “inpatient medical order table” records the inpatient medication.	ACEI/ARB	1: take the medicine; 0: not take
			β-Blocker	1: take the medicine; 0: not take
			Diuretic	1: take the medicine; 0: not take
			Huangqi (黄芪)	1: take the medicine; 0: not take
			Dangshen (党参)	1: take the medicine; 0: not take
			…	…
Laboratory test results table
	Laboratory test name and results	Extract the corresponding results of the patient according to the target feature	Serum potassium	The value of lab test (float)
			Serum sodium	The value of lab test (float)
			Serum creatinine	The value of lab test (float)
			…	…
Diagnostic details and outpatient visit record
	Diagnostic instructions	Extract the corresponding diagnostic instructions for the patient based on the target feature	Heart function level	1: heart function level I; 2: heart function level II; 3: heart function level III; 4: heart function level IV
			Diabetes	1: suffer the disease; 0: not suffer
			Hypertension	1: yes; 0: no
			…	…

Evaluation metrics	Features	Evaluation rules
Data Completeness
	Birth date	Birth date is not empty
	Gender	Gender must equal “1” or “2”
	Heart rate	“心律%” (heart rate%) or “HR%” appear in the symptom and sign information
	Disease code	Disease code is not empty and does not equal “自定义” (custom) or “-”
	Disease name	Disease name is not empty and does not equal “null”
	Therapeutic effect	Therapeutic effect is not empty
	Death information	The cause of death is not empty and does not equal “0”, or the time of death is not empty and does not equal “1900”
Data Consistency
	Birth date	Birth date of patient in patient information table is consistent with that in the first page of medical record
	Disease code	Disease code satisfies the Chinese standrad, namely GB/T 14396
	Disease name	Disease name satisfies the Chinese standrad, namely GB/T 14396

Early edition

Current issue

Archive

Specialties & Topics

Author center

Reviewer center

Constructing Large Scale Cohort for Clinical Study on Heart Failure with Electronic Health Record in Regional Healthcare Platform: Challenges and Strategies in Data Reuse

RichHTML

PDF (PC)

Knowledge

Highlight

Abstract

Cite this article

share this article

Figures/Tables 10

References 33

Related Articles 0

Metrics

Recommended 10