Dot products, norms, and angles are the backbone of geometric reasoning in linear algebra. For vectors $x, q \in \mathbb{R}^d$, the dot product $\langle x, q\rangle = x^\top q$ quantifies alignment and scales with the lengths of both vectors. The Euclidean norm $\|x\|_2 = \sqrt{\sum_i x_i^2}$ measures length, and the angle $\theta$ satisfies $\cos\theta = \frac{x^\top q}{\|x\|\,\|q\|}$ by the CauchyâSchwarz inequality. Cosine similarity therefore isolates direction: two vectors that point the same way (up to scale) have high cosine, even if one is very small.
In information retrieval, the vector space model ranks documents by similarity to a query, commonly using cosine because it discounts document length effects and term frequency scaling. In modern ML, embedding spaces for text, images, and users often use cosine to measure semantic closeness after normalization; dot product is preferred when magnitude carries signal (e.g., unnormalized counts or learned scaling in recommendation and attention). Understanding the trade-offs lets you design scoring functions that align with your data and training objectives.
Comments