Historical context: PCA dates back to Pearson (1901) and Hotelling (1933), with modern numerical methods refined through SVD (Golub & Reinsch, 1970). Power iteration is one of the earliest iterative eigenmethods, foundational to Krylov and Lanczos approaches. In large-scale ML (text, vision, recommender systems), iterative methods dominate because they scale with matrixâvector multiplies rather than full decompositions.
Mathematical characterization: For symmetric PSD covariance $\Sigma$, eigenpairs $(\lambda_i, u_i)$ satisfy $\Sigma u_i = \lambda_i u_i$ with $\lambda_1 \ge \lambda_2 \ge \cdots \ge 0$. Power iteration repeatedly applies $v \leftarrow \Sigma v$ and normalizes; if $v$ has nonzero overlap with $u_1$ and $\lambda_1 > \lambda_2$, then $v \to \pm u_1$. For centered data $X_c$, $\Sigma = \frac{1}{n-1} X_c^\top X_c$, so $u_1$ equals the first right singular vector of $X_c$.
Prevalence in ML: Computing only the top direction (or a few top directions) is common in dimensionality reduction, whitening, and streaming PCA. Iterative schemes (power/Lanczos) are used when $n,d$ are large, when data arrive online, or when forming $\Sigma$ is memory-expensive. Understanding these connections lets you switch between SVD-based PCA and iterative eigen methods confidently.
Comments