  1. 1. 华东理工大学 信息科学工程学院
    2. 上海医院发展中心
  • 收稿日期:2019-03-05 接受日期:2019-04-23 出版日期:2019-05-24 发布日期:2019-05-24
  • 通讯作者: 阮彤

Constructing Large Scale Cohort for Clinical Study on Heart Failure with Electronic Health Record in Regional Healthcare Platform: Challenges and Strategies in Data Reuse

Liu Daowen1,Lei Liqi1,Ruan Tong1,*(),He Ping2   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
    2. Shanghai Hospital Development Center, Shanghai 200041, China
  • Received:2019-03-05 Accepted:2019-04-23 Published:2019-05-24 Online:2019-05-24
  • Contact: Ruan Tong


区域性卫生平台汇集了多家医院的电子健康档案数据,已被用于医疗卫生管理领域。在临床研究中进一步复用这些数据是目前临床科研的公共需求,但是需要面对电子健康档案中医疗术语的不一致性以及区域卫生平台中数据质量和数据格式多样化等方面的挑战。我们提出了基于区域卫生平台电子健康档案半自动构建大规模队列的流程与方法,作为临床流行病学疗效研究的基础。 我们首先构建了一个中文医疗术语图谱,解决了区域医疗健康平台术语多样化的问题。其次,我们建立了利用中文术语知识图谱中的同义词关系和上下位关系归一化医疗术语来构建专病病例库的方法,并描述了构建了一个心力衰竭病例库的方法和步骤。根据一项观察他汀类药物对心力衰竭患者疗效的临床研究需求,我们基于此心力衰竭病例库,利用信息技术自动构建了一个由29647例心力衰竭患者数据构成的大型回顾性队列样本,并通过propensity score匹配获得了临床特征对等的病例组(n=6346)和对照组(n=6346)。以180天内再入院为结局指标,采用logistic回归分析发现,心力衰竭患者服用他汀类药物与180天内再入院有显著相关性(P<0.05)。本文为电子健康档案的大数据挖掘提供了工作流程和应用的范例。

关键词: 电子健康档案, 医疗术语图谱, 临床专病库, 数据质量评估, 大规模队列研究


Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management. It is a common requirement to reuse the data for clinical research. However, we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform. In this paper, we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology. We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform. Secondly, we built special disease case repositories (i.e., heart failure repository) that utilize the graph to search the related patients and to normalize the data. Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure, we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository. After the propensity score matching, the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired. Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients. This paper presents the workflow and application example of big data mining based on regional EHR data.

Key words: electronic health records, clinical terminology knowledge graph, clinical special disease case repository, evaluation of data quality, large scale cohort study

基金资助: 国家科技重大专项-新药创制项目(No. 2018ZX09201008);上海市经济和信息化委员会信息发展特别基金项目(No. 201701013).

