CSpace  > 应用数学研究所
Optimal subsample selection for massive logistic regression with distributed data
Zuo, Lulu1; Zhang, Haixiang1; Wang, HaiYing2; Sun, Liuquan3
2021-02-27
Source PublicationCOMPUTATIONAL STATISTICS
ISSN0943-4062
Pages28
AbstractWith the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method.
KeywordAllocation size Big data Distributed and massive data Subsample estimator Subsampling probabilities
DOI10.1007/s00180-021-01089-0
Indexed BySCI
Language英语
Funding ProjectNational Science Foundation (NSF), USA grant[DMS-1812013] ; National Natural Science Foundation of China[11771431] ; National Natural Science Foundation of China[11690015] ; National Natural Science Foundation of China[11926341] ; Key Laboratory of RCSDS, CAS[2008DP173182]
WOS Research AreaMathematics
WOS SubjectStatistics & Probability
WOS IDWOS:000622671900002
PublisherSPRINGER HEIDELBERG
Citation statistics
Document Type期刊论文
Identifierhttp://ir.amss.ac.cn/handle/2S8OKBNM/58237
Collection应用数学研究所
Corresponding AuthorZhang, Haixiang
Affiliation1.Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
2.Univ Connecticut, Dept Stat, Mansfield, CT 06269 USA
3.Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,et al. Optimal subsample selection for massive logistic regression with distributed data[J]. COMPUTATIONAL STATISTICS,2021:28.
APA Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,&Sun, Liuquan.(2021).Optimal subsample selection for massive logistic regression with distributed data.COMPUTATIONAL STATISTICS,28.
MLA Zuo, Lulu,et al."Optimal subsample selection for massive logistic regression with distributed data".COMPUTATIONAL STATISTICS (2021):28.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zuo, Lulu]'s Articles
[Zhang, Haixiang]'s Articles
[Wang, HaiYing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zuo, Lulu]'s Articles
[Zhang, Haixiang]'s Articles
[Wang, HaiYing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zuo, Lulu]'s Articles
[Zhang, Haixiang]'s Articles
[Wang, HaiYing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.