A New Test for high dimensional two-sample mean problems with consideration of correlation structure
Professor Runze Li
Eberly Family Chair in Statistics | Associate Head, Department of Statistics
Professor of Public Health Sciences
Pennsylvania State University
This paper is concerned with high-dimensional two-sample mean problems, which receive considerable attention in recent literature. To utilize the correlation information among variables for enhancing the power of two-sample mean tests, we consider the setting in which the precision matrix of high-dimensional data possesses a linear structure. Thus, we first propose a new precision matrix estimation procedure with considering its linear structure, and further develop regularization methods to select the true basis matrices and remove irrelevant basis matrices. With the aid of estimated precision matrix, we propose a new test statistic for the two-sample mean problems by replacing the inverse of sample covariance matrix in Hotelling test by the estimated precision matrix. The proposed test is applicable for both the low dimensional setting and high dimensional setting even if the dimension of the data exceeds the sample size. The limiting null distributions of the proposed test statistic under both null and alternative hypotheses are derived. We further derive the asymptotical power function of the proposed test and compare its asymptotic power with some existing tests. We found the estimation error of the precision matrix does not have impact on the asymptotical power function. Moreover, asymptotic relative efficiency of the proposed test to the classical Hotelling test tends to infinity when the ratio of the dimension of data to the sample size tends to 1. We conduct Monte Carlo simulation study to assess the finite sample performance of the proposed precision matrix estimation procedure and the proposed high-dimensional two-sample mean test. Our numerical results imply that the proposed regularization method is able to effectively remove irrelevant basis matrices. The proposed test performs well compared with the existing methods especially when the elements of the vector have unequal variances. We also illustrate the proposed methodology by an empirical analysis of a real-world data set.