Part 1: Core setup - Condition number predicts sensitivity
State the objects, shapes, and target question for Condition number predicts sensitivity. Name the data matrices or vectors, specify their dimensions, and clarify the transformation or comparison this example develops.
Part 2: Geometry and algebraic insight - Condition number predicts sensitivity
Describe the geometric picture (subspaces, projections, bases, or decompositions) and the algebraic identities that make Condition number predicts sensitivity work. Highlight how these structures constrain solutions and connect to earlier linear algebra tools.
Part 3: Numerics and ML practice - Condition number predicts sensitivity
Give the computational recipe for Condition number predicts sensitivity, note stability or conditioning checks, and tie to an ML use case. Mention parameter choices, common pitfalls, and quick sanity checks such as shape validation or reconstruction error.
- Shape discipline: check dimensions before manipulating formulas.
- Numerical note: prefer stable primitives (
lstsq, QR/SVD, Cholesky for SPD) over explicit inverses.
- Interpretation: relate algebraic steps to geometry (subspaces, projections) and to ML behavior (generalization, stability).
For the well-conditioned matrix A_good = diag(1.0, 0.9), the condition number is $\kappa \approx 1.0 / 0.9 \approx 1.11$ (ratio of largest to smallest singular value). A small perturbation $\delta b$ of relative size $10^{-6}$ in $b$ produces a relative change in $x$ of roughly the same orderâinput noise stays proportional to output noise. This is the ideal regime: numerical errors remain controlled, and the solution is trustworthy.
For the ill-conditioned matrix A_bad = diag(1.0, 10^{-6}), the condition number is $\kappa \approx 1.0 / 10^{-6} = 10^6$. The same $10^{-6}$ perturbation in $b$ now causes a relative change in $x$ of order $10^0$ (100% error or larger). This catastrophic amplification occurs because the small eigenvalue $10^{-6}$ makes $A$ nearly singularâinverting it multiplies errors by the reciprocal of the smallest singular value. In ML contexts, this manifests as training instability: gradient updates become unpredictable, parameter estimates explode, and small changes in data cause wildly different models.
ML Connections: The condition number $\kappa(X^\top X)$ in least squares, $\kappa(\Sigma)$ in PCA, and $\kappa(H)$ (Hessian conditioning) in optimization all determine whether algorithms converge reliably. Regularization (ridge regression, weight decay) explicitly improves conditioning by adding $\lambda I$ to matrices, bounding the smallest eigenvalue away from zero. Preconditioning (batch normalization, adaptive optimizers like Adam) rescales gradients to balance the eigenvalue spectrum. Shape discipline helps diagnose conditioning: if $X \in \mathbb{R}^{n \times d}$ with $n \ll d$ (underdetermined), $X^\top X$ is rank-deficient ($\kappa = \infty$), signaling the need for regularization or dimensionality reduction. Always inspect singular values before inverting matricesâsmall singular values flag numerical danger zones where floating-point arithmetic breaks down.
Comments