ReBoot: Distributed statistical learning via refitting Bootstrap samples
Dr. Ziwei Zhu
Assistant Professor of Statistics
Department of Statistics
University of Michigan, Ann Arbor
ABSTRACT
In this paper, we study a one-shot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems respectively. In both cases, ReBoot provably achieves the full-sample statistical rate whenever the subsample size is not too small. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples (the number of sites), is O(n-2 ) in GLM, where n is the subsample size. This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Simulation study demonstrates the statistical advantage of ReBoot over competing methods including averaging and CSL (Communication-efficient Surrogate Likelihood) with one round of gradient communication. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification, which exhibits substantial superiority over FedAvg within early rounds of communication.