Chinese Medical Sciences Journal ›› 2022, Vol. 37 ›› Issue (3): 210-217.doi: 10.24920/004086

• Scientific Data Sharing and Reuse:Original Article • Previous Articles     Next Articles

Prostate Cancer Risk Prediction and Online Calculation Based on Machine Learning Algorithm

Chun Wang1, Qinxue Chang1, Xiaomeng Wang1, Keyun Wang1, He Wang2, Zhuang Cui1, *(), Changping Li1, *()   

  1. 1Department of Health Statistics, School of Public Health, Tianjin Medical University, Tianjin 300070, China
    2Department of Medical Imaging, Peking University First Hospital, Beijing 100034, China
  • Received:2022-03-21 Accepted:2022-08-24 Published:2022-09-30 Online:2022-09-22
  • Contact: Zhuang Cui,Changping Li;

Objective To build a prostate cancer (PCa) risk prediction model based on common clinical indicators to provide a theoretical basis for the diagnosis and treatment of PCa and to evaluate the value of artificial intelligence (AI) technology under healthcare data platforms.
Methods After preprocessing of the data from Population Health Data Archive, smuothly clipped absolute deviation (SCAD) was used to select features. Random forest (RF), support vector machine (SVM), back propagation neural network (BP), and convolutional neural network (CNN) were used to predict the risk of PCa, among which BP and CNN were used on the enhanced data by SMOTE. The performances of models were compared using area under the curve (AUC) of the receiving operating characteristic curve. After the optimal model was selected, we used the Shiny to develop an online calculator for PCa risk prediction based on predictive indicators.
Results Inorganic phosphorus, triglycerides, and calcium were closely related to PCa in addition to the volume of fragmented tissue and free prostate-specific antigen (PSA). Among the four models, RF had the best performance in predicting PCa (accuracy: 96.80%; AUC: 0.975, 95% CI: 0.964-0.986). Followed by BP (accuracy: 85.36%; AUC: 0.892, 95% CI: 0.849-0.934) and SVM (accuracy: 82.67%; AUC: 0.824, 95% CI: 0.805-0.844). CNN performed worse (accuracy: 72.37%; AUC: 0.724, 95% CI: 0.670-0.779). An online platform for PCa risk prediction was developed based on the RF model and the predictive indicators.
Conclusions This study revealed the application value of traditional machine learning and deep learning models in disease risk prediction under healthcare data platform, proposed new ideas for PCa risk prediction in patients suspected for PCa and had undergone core needle biopsy. Besides, the online calculation may enhance the practicability of AI prediction technology and facilitate medical diagnosis.

Key words: prostate cancer, random forest, support vector machine, back-propagation neural network, convolutional neural network

Copyright © 2021 Chinese Academy of Medical Sciences.  京公安备110402430088  京ICP备06002729号-1  Powered by Magtech.

Supervised by National Health Commission of the People's Republic of China

9 Dongdan Santiao, Dongcheng district, Beijing, 100730 China

Tel: 86-10-65105897  Fax:86-10-65133074 


Copyright © 2018 Chinese Academy of Medical Sciences

All right reserved.

京公安备110402430088  京ICP备06002729号-1