Chinese Medical Sciences Journal ›› 2017, Vol. 32 ›› Issue (4): 218-225.doi: 10.24920/J1001-9294.2017.054

• 论著 • 上一篇    下一篇

亚健康影响因素的大规模人群调查中零频数过多模型的拟合优度研究

徐涛1, 朱广瑾2, 韩少梅1,*()   

  1. 1中国医学科学院基础医学研究所/北京协和医学院基础学院 流行病及统计学系, 北京100005
    2中国医学科学院基础医学研究所/北京协和医学院基础学院 病理生理学系,北京100005
  • 收稿日期:2017-03-30 出版日期:2017-12-30 发布日期:2017-12-30
  • 通讯作者: 韩少梅 E-mail:hansm1@vip.sina.com

Study of Zero-Inflated Regression Models in a Large-Scale Population Survey of Sub-Health Status and Its Influencing Factors

Xu Tao1, Zhu Guangjin2, Han Shaomei1,*()   

  1. 1Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China;
    2Department of physiopathology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China;
  • Received:2017-03-30 Published:2017-12-30 Online:2017-12-30
  • Contact: Han Shaomei E-mail:hansm1@vip.sina.com

摘要: 目的

亚健康近年来引起了越来越多的医务人员和普通公众的注意。把亚健康症状数作为计数资料进行处理能够获得更完整和准确的分析结果。本研究旨在探讨计数资料模型的拟合优度以确定最佳的亚健康研究模型。

方法

研究样本来源于一项2007年到2014年在中国四省两自治区进行的一项关于生理和心理的大规模人群调查研究。用SAS软件拟合四个计数资料模型:Poisson回归、负二项回归(NB)、零频数过多的Poisson回归(ZIP)和零频数过多的Poisson回归负二项回归模型(ZINB)。亚健康症状数是主要结局变量。用α系数和O 检验验证是否过度离散,用Vuong检验验证零频数是否过多。用预测概率曲线和似然比检验评价模型的拟合优度。

结果

在78,307名受试者中,38.53%的受试者没有任何亚健康症状。亚健康症状数的平均水平为2.98±3.72。O 检验统计量为720.995 (P<0.001)。α系数为0.618 (95% 可信区间: 0.600-0.636)。Vuong检验的统计量Z=45.487。ZINB模型有最大的似然比(-167519)、最小的赤池信息准则系数(335112) 和贝叶斯信息准则系数 (335455)。ZINB模型对大多数症状数的预测概率都优于其他模型。ZINB模型的logit部分显示年龄、性别、职业、吸烟、饮酒、民族、肥胖是发生亚健康症状的危险因素。负二项部分显示性别、职业、吸烟、饮酒、民族、婚姻状况和肥胖对于亚健康的程度有影响。

结论

各种拟合优度检验和预测概率曲线的综合结果表明ZINB模型是探索亚健康症状数的影响因素的最佳模型。

关键词: 零频数过多,, 负二项回归,, 亚健康,, 人群调查

Abstract: Objective

Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.

Methods

The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.

Results

Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455), indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.

Conclusions

All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.

Key words: zero-inflated, negative binomial regression, sub-health, population survey

Copyright © 2018 Chinese Academy of Medical Sciences. All right reserved.
 
www.cmsj.cams.cn
京公安备110402430088 京ICP备06002729号-1  Powered by Magtech.

Supervised by National Health Commission of the People's Republic of China

9 Dongdan Santiao, Dongcheng district, Beijing, 100730 China

Tel: 86-10-65105897  Fax:86-10-65133074 

E-mail: cmsj@cams.cn  www.cmsj.cams.cn

Copyright © 2018 Chinese Academy of Medical Sciences

All right reserved.

京公安备110402430088  京ICP备06002729号-1