Chinese Medical Sciences Journal ›› 2022, Vol. 37 ›› Issue (3): 210-217.doi: 10.24920/004086

• 科学数据共享与重用: 论著 • 上一篇    下一篇



  1. 1天津医科大学公共卫生学院,天津 300070,中国
    2北京大学第一医院医学影像科,北京 100034,中国
  • 收稿日期:2022-03-21 接受日期:2022-08-24 出版日期:2022-09-30 发布日期:2022-09-22
  • 通讯作者: 崔壮,李长平;

Prostate Cancer Risk Prediction and Online Calculation Based on Machine Learning Algorithm

Chun Wang1,Qinxue Chang1,Xiaomeng Wang1,Keyun Wang1,He Wang2,Zhuang Cui1,*(),Changping Li1,*()   

  1. 1Department of Health Statistics, School of Public Health, Tianjin Medical University, Tianjin 300070, China
    2Department of Medical Imaging, Peking University First Hospital, Beijing 100034, China
  • Received:2022-03-21 Accepted:2022-08-24 Published:2022-09-30 Online:2022-09-22
  • Contact: Zhuang Cui,Changping Li;


目的 基于临床常用指标,采用机器学习方法构建前列腺癌风险预测模型,为前列腺癌的早期诊疗提供科学依据,评价人工智能技术在医疗健康数据平台下的应用价值。
方法 对国家临床医学科学数据中心提供的前列腺肿瘤预警数据集预处理后,使用平滑剪切绝对偏差(smoothly clipped absolute deviation,SCAD)算法筛选特征指标。采用随机森林(Radom forest,RF)、支持向量机(support vector machine,SVM)、反向传播(back propagation,BP)神经网络、卷积神经网络(convolutional neural network,CNN)4种模型预测前列腺癌发生风险,其中神经网络模型使用经SMOTE增强后数据拟合。不同模型的预测能力采用受试者操作特性(ROC)曲线下面积(area under the curve,AUC)进行比较。在确定最优模型后,使用Shiny开发前列腺癌风险预测在线平台。
结果 在预测变量中,除活检标本碎组织体积、血游离前列腺特异抗原(fPSA)外,无机磷、甘油三酯、游离钙等临床常用指标与前列腺癌也密切相关。在4种模型中,RF预测效果最好(准确率:96.80%;AUC:0.975,95%CI:0.964-0.986),其次为BP神经网络(准确率:85.36%;AUC:0.892,95%CI:0.849-0.934),SVM(准确率:82.67%;AUC:0.824,95%CI:0.805-0.844)与BP神经网络预测效果相近,CNN预测能力最低(准确率:72.37%;AUC:0.724,95%CI:0.670-0.779)。基于RF及预测指标成功开发了一种前列腺癌风险预测在线平台。
结论 本研究揭示了医疗信息化平台下传统机器学习方法和基础神经网络模型在疾病风险预测中的应用价值,为疑似前列腺癌并接受穿刺活检人群的前列腺癌预测提出了新思路。此外,开发在线预测系统有助于增强人工智能预测技术的实用性,使医疗应用更为便捷。

关键词: 前列腺癌, 随机森林, 支持向量机, 反向传播神经网络, 卷积神经网络


Objective To build a prostate cancer (PCa) risk prediction model based on common clinical indicators to provide a theoretical basis for the diagnosis and treatment of PCa and to evaluate the value of artificial intelligence (AI) technology under healthcare data platforms.
Methods After preprocessing of the data from Population Health Data Archive, smuothly clipped absolute deviation (SCAD) was used to select features. Random forest (RF), support vector machine (SVM), back propagation neural network (BP), and convolutional neural network (CNN) were used to predict the risk of PCa, among which BP and CNN were used on the enhanced data by SMOTE. The performances of models were compared using area under the curve (AUC) of the receiving operating characteristic curve. After the optimal model was selected, we used the Shiny to develop an online calculator for PCa risk prediction based on predictive indicators.
Results Inorganic phosphorus, triglycerides, and calcium were closely related to PCa in addition to the volume of fragmented tissue and free prostate-specific antigen (PSA). Among the four models, RF had the best performance in predicting PCa (accuracy: 96.80%; AUC: 0.975, 95% CI: 0.964-0.986). Followed by BP (accuracy: 85.36%; AUC: 0.892, 95% CI: 0.849-0.934) and SVM (accuracy: 82.67%; AUC: 0.824, 95% CI: 0.805-0.844). CNN performed worse (accuracy: 72.37%; AUC: 0.724, 95% CI: 0.670-0.779). An online platform for PCa risk prediction was developed based on the RF model and the predictive indicators.
Conclusions This study revealed the application value of traditional machine learning and deep learning models in disease risk prediction under healthcare data platform, proposed new ideas for PCa risk prediction in patients suspected for PCa and had undergone core needle biopsy. Besides, the online calculation may enhance the practicability of AI prediction technology and facilitate medical diagnosis.

Key words: prostate cancer, random forest, support vector machine, back-propagation neural network, convolutional neural network

Copyright © 2018 Chinese Academy of Medical Sciences. All right reserved.
京公安备110402430088 京ICP备06002729号-1  Powered by Magtech.

Supervised by National Health & Family Plan Commission of PRC

9 Dongdan Santiao, Dongcheng district, Beijing, 100730 China

Tel: 86-10-65105897  Fax:86-10-65133074 


Copyright © 2018 Chinese Academy of Medical Sciences

All right reserved.

京公安备110402430088  京ICP备06002729号-1