KMS Of Academy of mathematics and systems sciences, CAS
Optimal subsample selection for massive logistic regression with distributed data | |
Zuo, Lulu1; Zhang, Haixiang1; Wang, HaiYing2; Sun, Liuquan3![]() | |
2021-02-27 | |
Source Publication | COMPUTATIONAL STATISTICS
![]() |
ISSN | 0943-4062 |
Pages | 28 |
Abstract | With the emergence of big data, it is increasingly common that the data are distributed. i.e., the data are stored at many distributed sites (machines or nodes) owing to data collection or business operations, etc. We propose a distributed subsampling procedure in such a setting to efficiently approximate the maximum likelihood estimator for the logistic regression. We establish the consistency and asymptotic normality of the subsample estimator given the full data. The optimal subsampling probabilities and optimal allocation sizes are explicitly obtained. We develop a two-step algorithm to approximate the optimal subsampling procedure. Numerical simulations and an application to airline data are presented to evaluate the performance of our subsampling method. |
Keyword | Allocation size Big data Distributed and massive data Subsample estimator Subsampling probabilities |
DOI | 10.1007/s00180-021-01089-0 |
Indexed By | SCI |
Language | 英语 |
Funding Project | National Science Foundation (NSF), USA grant[DMS-1812013] ; National Natural Science Foundation of China[11771431] ; National Natural Science Foundation of China[11690015] ; National Natural Science Foundation of China[11926341] ; Key Laboratory of RCSDS, CAS[2008DP173182] |
WOS Research Area | Mathematics |
WOS Subject | Statistics & Probability |
WOS ID | WOS:000622671900002 |
Publisher | SPRINGER HEIDELBERG |
Citation statistics | |
Document Type | 期刊论文 |
Identifier | http://ir.amss.ac.cn/handle/2S8OKBNM/58237 |
Collection | 应用数学研究所 |
Corresponding Author | Zhang, Haixiang |
Affiliation | 1.Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China 2.Univ Connecticut, Dept Stat, Mansfield, CT 06269 USA 3.Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China |
Recommended Citation GB/T 7714 | Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,et al. Optimal subsample selection for massive logistic regression with distributed data[J]. COMPUTATIONAL STATISTICS,2021:28. |
APA | Zuo, Lulu,Zhang, Haixiang,Wang, HaiYing,&Sun, Liuquan.(2021).Optimal subsample selection for massive logistic regression with distributed data.COMPUTATIONAL STATISTICS,28. |
MLA | Zuo, Lulu,et al."Optimal subsample selection for massive logistic regression with distributed data".COMPUTATIONAL STATISTICS (2021):28. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment