This code demonstrates condition number and solution sensitivityâa fundamental principle in numerical linear algebra. It shows that small perturbations in data ($b$) can be amplified dramatically in the solution ($x$) when the matrix $A$ is ill-conditioned, illustrating why condition number is the key metric for assessing numerical stability in ML solvers.
Part 1: Ill-Conditioned System. The matrix $A$ is diagonal: $A = \begin{bmatrix} 1 & 0 \\ 0 & 10^{-6} \end{bmatrix}$, with eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 10^{-6}$. The condition number is: $\kappa(A) = \frac{\lambda_{\max}}{\lambda_{\min}} = \frac{1}{10^{-6}} = 10^6$, an extremely ill-conditioned system. One eigenvalue is huge compared to the otherâthe matrix âstretchesâ in one direction and âsquashesâ in another, making the solution highly sensitive to perturbations. The right-hand side is $b = [1, 1]^\top$, and a small perturbation is introduced: $\Delta b = [0, 10^{-6}]^\top$, where the perturbation affects only the second component (the direction of the smallest eigenvalue). This is intentional: perturbations in âweakâ directions are amplified most severely.
Part 2: Solving Two Slightly Different Systems. The code solves two linear systems: $A x_1 = b$ and $A x_2 = (b + \Delta b)$, via np.linalg.solve(), which uses LU decomposition with pivoting. The solutions are: $x_1 = A^{-1} b = \begin{bmatrix} 1 \\ 10^6 \end{bmatrix}$ and $x_2 = A^{-1} (b + \Delta b) = \begin{bmatrix} 1 \\ 10^6 + 1 \end{bmatrix}$. (Verify: $A x_1 = [1 \cdot 1, 10^{-6} \cdot 10^6]^\top = [1, 1]^\top = b$ â) The difference is $x_2 - x_1 = [0, 1]^\top$, which is small in absolute terms but large relative to $x_1$.
Part 3: Measuring Relative Error Amplification. The code computes relative errors (perturbations normalized by original magnitudes): Relative perturbation in $b$: $\frac{\|\Delta b\|_2}{\|b\|_2} = \frac{\sqrt{0^2 + (10^{-6})^2}}{\sqrt{1^2 + 1^2}} = \frac{10^{-6}}{\sqrt{2}} \approx 0.707 \times 10^{-6}$. Relative error in solution: $\frac{\|x_2 - x_1\|_2}{\|x_1\|_2} = \frac{\sqrt{0^2 + 1^2}}{\sqrt{1^2 + (10^6)^2}} \approx \frac{1}{10^6} \approx 10^{-6}$. Typical output:
cond(A): 1000000.0
rel_db: 7.071068e-07
rel_dx: 1.000000e-06
The key observation: the relative error in the solution is roughly equal to the relative perturbation in the data, scaled by the condition number. More precisely, the amplification factor is approximately $\sqrt{\kappa(A)}$ in this specific example due to the diagonal structure and perturbation direction, but the general principle is: $\frac{\text{rel}_{\Delta x}}{\text{rel}_{\Delta b}} \lesssim \kappa(A)$.
Why This Matters for ML. Perturbation error bounds: For a linear system $Ax = b$, if $b$ is perturbed by $\Delta b$, the relative error in the solution satisfies: $\frac{\|\Delta x\|}{\|x\|} \le \kappa(A) \frac{\|\Delta b\|}{\|b\|}$, where $\Delta x = A^{-1}(b + \Delta b) - A^{-1} b$. The condition number is the amplification factorâhigher condition number means worse sensitivity. Data noise amplification: In ML, data is noisy. If features have near-collinear relationships (e.g., polynomial features $[x, x^2, x^3]$ for large $x$), the design matrix $X$ becomes ill-conditioned. Solving least-squares $X\hat{w} = y$ then amplifies measurement noise by a factor of $\kappa(X)$, yielding unstable weight estimates. Ridge regression adds $\lambda I$ to the normal equations, improving conditioning artificially. Numerical precision loss: In floating-point arithmetic, every operation introduces rounding errors. For ill-conditioned systems, these rounding errors get amplified. If $\kappa(A) = 10^{12}$, the effective precision is $10^{12} \times 10^{-16} = 10^{-4}$âonly 4 digits of accuracy despite 16-digit precision in double-precision floats. The solver can do nothing to recover lost precision. Solver choice consequences: For ill-conditioned systems: (1) Normal equations $(X^\top X) \hat{w} = X^\top y$ square the condition number: $\kappa(X^\top X) = [\kappa(X)]^2$ (catastrophic), (2) SVD/QR-based least-squares avoid squaring, maintaining $\kappa(X)$, (3) Regularization (ridge, early stopping) improves conditioning artificially. Practitioners often donât have a choice of $A$ (itâs determined by the problem), but choosing the right solver can prevent amplification.
Numerical Gotchas and Verification. Shape discipline: $A \in \mathbb{R}^{2 \times 2}$, $b \in \mathbb{R}^2$, $\Delta b \in \mathbb{R}^2$, $x_1, x_2 \in \mathbb{R}^2$. Diagonal structure is special: For diagonal $A$, $\kappa(A)$ is simply the ratio of diagonal elements. General matrices require SVD or eigendecomposition. Perturbation direction matters: In this example, $\Delta b$ targets the âweakâ direction (smallest eigenvalue). Perturbations in âstrongâ directions are amplified less. Relative vs. absolute error: Always use relative errors for assessing numerical stability; absolute errors are scale-dependent. Condition number doesnât predict wall-clock time: High $\kappa(A)$ makes convergence slow, but effect is polynomial, not exponential.
Connection to Linear Algebra Theory. Condition number definition: $\kappa(A) = \|A\|_2 \|A^{-1}\|_2 = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}$ (ratio of largest to smallest singular value). Perturbation bound: For $Ax = b$ and perturbation $\Delta b$, the solution error satisfies: $\frac{\|\Delta x\|}{\|x\|} \le \kappa(A) \frac{\|\Delta b\|}{\|b\|}$. This bound is tight. No algorithm can overcome it: For any solver, achievable relative error is bounded by: $\text{relative error} \ge c \cdot \kappa(A) \times \text{machine precision}$ (where $c$ is small). This is a fundamental limit of computation in finite precision.
Pedagogical Significance. This example is the canonical demonstration of condition number and perturbation sensitivity: (1) Condition number is concrete, not abstractâ$\kappa(A)$ is an exact multiplier of error amplification, observable and reproducible, (2) Ill-conditioning is inevitable for some problemsâyou canât escape it; itâs a property of the matrix, (3) No algorithm can overcome the fundamental limitâall solvers produce relative error $\sim \kappa(A) \times \text{machine precision}$, (4) Regularization is a numerical necessityâridge and other regularization improve conditioning artificially. The code is minimal (11 lines), but the lesson is profound: understand condition number, and you understand why ML systems are hard to train and optimize.
Comments