Accueil du site > Séminaires > Séminaires 2016 > Random matrices and analysis of correlations in data
Mardi 23 février 2016-14:00
Dario Villamaina (LPT-ENS, Paris)
par
- 23 février 2016
Determining correlations among variables starting from some observations is a common problem in statistics. In these cases, one deals with some estimators of population correlation matrices, which are affected by finite sampling effects (see figure 1). One of the most common techniques generally used for this kind of problems is the principal component analysis, where one usually retains only the components corresponding to larger eigenvalues of sample correlation matrices, considered as the most informative. Actually, what is usually neglected in this procedure (namely the eigenvectors associated to smaller eigenvalues) is not just related to the sampling noise. Indeed, using a combination of random matrix and information-theoretic tools, I will show that all the eigenvectors of sample correlation matrices are informative about the principal components (namely, the eigenvectors associated to large eigenvalues) of the population correlation matrix. This extra information can be used in order to efficiently improve standard data cleaning procedures.
Post-scriptum :
contact : N. Destainville