Principal component analysis

From stats++ wiki
Jump to: navigation, search

Principal component analysis (PCA) uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables, called principal components.

Theory

{JMM: For the time being, see the wikipedia page on this, it explains better} In essence, the goal of PCA is to find the eigenvalues and eigenvectors of the covariance matrix \begin{equation} \tag{1} \mathbf{\Sigma} \left( \mathbf{X} \right) = \operatorname{E} \left[ \mathbf{X} \mathbf{X}^\mathrm{T} \right] \end{equation}

Consider the singular value decomposition (SVD) of $\mathbf{X}$: \begin{equation} \tag{2} \mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\mathrm{T} \end{equation} where the diagonal elements of $\mathbf{\Sigma}$ (not to be confused with $\mathbf{\Sigma} \left( \mathbf{X} \right)$) contain the non-singular values of $\mathbf{X}$. The covariance matrix can be constructed as follow: \begin{equation} \tag{3} \mathbf{X} \mathbf{X}^\mathrm{T} = \left( \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\mathrm{T} \right) \left( \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\mathrm{T} \right)^\mathrm{T} = \left( \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\mathrm{T} \right) \left( \mathbf{V} \mathbf{\Sigma} \mathbf{U}^\mathrm{T} \right) = \mathbf{U} \mathbf{\Sigma}^2 \mathbf{U}^\mathrm{T} \end{equation} (Note also the alternative construction $\mathbf{X}^\mathrm{T} \mathbf{X} = \mathbf{V} \mathbf{\Sigma}^2 \mathbf{V}^\mathrm{T}$.)

Since the matrix $\mathbf{X} \mathbf{X}^\mathrm{T}$ is symmetric, and hence diagonalizable, it can be noted that Eq. (3) corresponds to its eigendecomposition. Hence, the singular values of $\mathbf{X}$ correspond to the square roots of the eigenvalues of $\mathbf{X} \mathbf{X}^\mathrm{T}$, with the corresponding eigenvalues in $\mathbf{U}$ (or $\mathbf{V}$).

Note: Numerically, it is better to perform SVD on $\mathbf{X}$ as opposed to forming $\mathbf{X} \mathbf{X}^\mathrm{T}$ directly. In some cases, the formation of $\mathbf{X} \mathbf{X}^\mathrm{T}$ can cause a loss of precision, while the SVD remains stable.