The null space $\mathcal{N}(X) = \{z \in \mathbb{R}^d : Xz = 0\}$ consists of all vectors that map to zero under $X$. By the rankânullity theorem, $\text{rank}(X) + \dim(\mathcal{N}(X)) = d$, so when $n < d$ and $X$ has full row rank $n$, the null space has dimension $d - n$. Any solution $w_0$ to $Xw = y$ can be shifted by any $z \in \mathcal{N}(X)$ to produce another solution: $X(w_0 + \alpha z) = Xw_0 = y$ for all scalars $\alpha$.
In ML, this non-identifiability appears whenever parameters outnumber training examples. Wide neural networks, large embeddings, and overparameterized linear models all exhibit null-space structure. SVD exposes this: $X = U\Sigma V^\top$ with singular values $\sigma_1 \geq \cdots \geq \sigma_r > 0$ and $r = \text{rank}(X)$. The last $d - r$ right singular vectors (rows of $V^\top$) span $\mathcal{N}(X)$. Small but nonzero singular values create near-null directions where parameters change a lot but predictions change little, leading to instability and poor generalization without regularization.
Comments