PCA originated with Karl Pearson (1901) as lines/planes of closest fit to data, and was extended by Harold Hotelling (1933) to principal components maximizing variance. The KarhunenâLoève transform formalized PCA in stochastic processes. The EckartâYoung theorem (1936) proved that truncating the SVD gives the best low-rank approximation in Frobenius norm. Modern treatments (Jolliffe, Bishop, HastieâTibshiraniâFriedman) emphasize PCA as both an optimization problem (variance maximization) and a geometric projection onto orthogonal directions of maximal spread.
Computationally, PCA is implemented stably via SVD of centered data $X_c$: $X_c = U \Sigma V^\top$. The covariance $\Sigma_x = \frac{1}{n-1} X_c^\top X_c$ is positive semidefinite, and its eigen-decomposition aligns with SVD via $\lambda_i = \sigma_i^2/(n-1)$. For large datasets, randomized SVD and incremental PCA compute top components efficiently.
Comments