Notes on Linear Algebra & Learning from Data, Strang

note: in progress!

Pt. 1: Highlights of Linear Algebra

Basic problems studied in part 1:

$Ax = b$
$Ax = \lambda x$
$Av = \sigma u$
Minimize $||Ax ||^2 / ||x||^2$
Factor $A$

1.1: Multiplication Ax Using Columns of A

We can interpret matrix multiplication by rows (traditional method) or columns:

\begin{pmatrix}2 & 3 \\2 & 4 \\3 & 7 \\\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 2x_1 + 3x_2 \\ 2x_1 + 4x_2 \\ 3x_1 + 7x_2 \end{pmatrix}

can be interpreted as the inner products of the rows of $A$ with $x = (x_1, x_2)$ . On the other hand,

\begin{pmatrix} 2 & 3 \\ 2 & 4 \\ 3 & 7 \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = x_1 \begin{pmatrix} 2 \\ 2 \\ 3 \end{pmatrix} + x_2 \begin{pmatrix} 3 \\ 4 \\ 7 \end{pmatrix}

is a combination of the columns $a_1$ and $a_2$ . Both ways give the same result. The first is computational but unhelpful for intuition. With the vector approach, we understand $Ax$ as a linear combination of the columns of $A$ . The combinations of the columns fill out the column space of $A$ .

Independence: from left to right on the columns of $A$ , we can construct a matrix $C$ by taking any nonzero columns from $A$ independent from the current columns in $C$ thus far and add them to $C$ . For a matrix $A$ with $n$ columns, we will end up with $C$ having $r \leq n$ columns, forming a basis for the column space of $A$ . The number of columns $r$ in $C$ is the rank of $C$ and the dimension of the column space of $A$ and $C$ . Hence, the rank of a matrix is the dimension of its column space.

We can connect $C$ to $A$ with a third matrix $R$ s.t. $A = CR$ , with shapes

(m \times n) = (m \times r) (r \times n).

For example,

A = \begin{pmatrix} 1&3&8\\ 1&2&6\\0&1&2 \end{pmatrix} = \begin{pmatrix} 1&3 \\ 1&2 \\ 0&1 \end{pmatrix} \begin{pmatrix} 1&0&2 \\ 0&1&2 \end{pmatrix} = CR.

Notice the multiplication from the column perspective: $C$ times the first column of $R$ is column 1 of $A$ , and similarly for the second column. When $C$ multiplies the third column of $R$ , we get 2(column 1) + 2(column 2), which is column 3 of $A$ .

In fact, $R = \text{rref}(A)$ , the row-reduced echelon form of A (without zero rows).

Note also from the above note on the shapes of $C$ and $R$ , that the number of independent columns equals the number of independent rows: column rank = row rank.

SVD of $A$ is when the first factor $C$ has $r$ orthogonal columns and the second factor $R$ has $r$ orthogonal rows.

1.2: Matrix-Matrix Multiplication AB

Inner products produce each of the numbers in $AB = C$ . For example, row 2 of $A$ and column 3 of $B$ give $c_{23}$ in $C$ :

\begin{pmatrix}.&.&.\\a_{21}&a_{22}&a_{23}\\.&.&.\end{pmatrix} \begin{pmatrix}.&.&b_{13}\\.&.&b_{23}\\.&.&b_{33}\end{pmatrix} = \begin{pmatrix}.&.&.\\.&.&c_{23}\\.&.&.\end{pmatrix}.

The other way to multiply $AB$ is columns of $A$ times rows of $B$ .

Note that a column $u$ times a row $v^T$ produces a matrix (of rank one, all columns being multiples of $u$ and all rows being multiples of $v^T$ ). This is called the outer product. The column space is the line in the direction of $u$ , and the row space is the line in the direction of $v$ .

The full product $AB$ using columns of $A$ times rows of $B$ : let $a_1, ..., a_n$ be the $n$ columns of $A$ . Then $B$ must have $n$ rows $b_1^*, ..., b_n^*$ . The product $AB$ is the sum of columns $a_k$ times rows $b^*_k$ , i.e., the sum of rank 1 matrices

AB = a_1b^*_1 + a_2b^*_2 +... +a_nb^*_n.

The outer product approach is essential to data science because we want to break $A$ down into pieces (i.e., rank 1 matrices). A dominant them in applied linear algebra is to factor $A$ into $CR$ and look at the pieces $c_k r_k^*$ of $A = CR$ . Factoring takes longer than multiplying, especially if the pieces involve eigenvalues or singular values.

Five important factorizations are:

$A = LU$ comes from elimination. Combinations of rows take $A$ to $U$ and $U$ back to $A$ . $L$ is lower triangular and $U$ is upper triangular.
$A = QR$ comes from orthogonalizing the columns of $A$ as in Gram-Schmidt. $Q$ has orthonormal columns ( $Q^T Q = I$ ) and $R$ is upper triangular.
$S = Q \Lambda A^T$ comes from the eigenvalues of a symmetric matrix $S = S^T$ . Eigenvalues on the diagonal of $\Lambda$ . Orthonormal eigenvectors in the columns of $Q$ .
$A = X \Lambda X^{-1}$ is diagonalization when $A$ is $n \times n$ with $n$ independent eigenvectors. Eigenvalues of $A$ on the diagonal of $\Lambda$ . Eigenvectors of $A$ in the columns of $X$ .
$A = U \Sigma V^T$ is the Singular Value Decomposition of any matrix $A$ , with singular values in $\Sigma$ and orthonormal singular vectors in $U$ and $V$ .

1.3 The Four Fundamental Subspaces

Every $m \times n$ matrix $A$ leads to four subspaces: two subspaces of $\mathbb{R}^m$ and two more of $\mathbb{R}^n$ .

Column space $C(A)$ , dim $r$ , subspace of $\mathbb{R}^m$
Row space $C(A^T)$ , dim $r$ , subspace of $\mathbb{R}^n$
Nullspace $N(A)$ , dim $n - r$ , subspace of $\mathbb{R}^n$
Left nullspace $N(A^T)$ , dim $m - r$ , subspace of $\mathbb{R}^m$

The ranks of $AB$ and $A+B$ :

Rank of $AB \leq$ rank of $A$ , rank of $AB \leq$ rank of $B$
Rank of $A+B \leq$ (rank of $A$ ) + (rank of $B$ )
Rank of $A^TA =$ rank of $AA^T =$ rank of $A =$ rank of $A^T$ .
If $A$ is $(m \times r)$ and $B$ is $(r \times n)$ , both with rank $r$ , then $AB$ has rank $r$ .

Notes on Linear Algebra & Learning from Data, Strang

Pt. 1: Highlights of Linear Algebra

1.1: Multiplication Ax Using Columns of A

1.2: Matrix-Matrix Multiplication AB

1.3 The Four Fundamental Subspaces

1.4 Elimination and A=LU

1.5 Orthogonal Matrices and Subspaces

1.6 Eigenvalues and Eigenvectors

1.7 Symmetric Positive Definite Matrices

1.8 Singular Values and Singular Vectors in SVD

1.9 Principal Components and the Best Low Rank Matrix

1.10 Rayleigh Quotients and Generalized Eigenvalues

1.11 Norms of Vectors and Functions and Matrices

1.12 Factoring Matrices and Tensors: Positive and Sparse

Pt. 2: Computations with Large Matrices

Pt. 3: Low Rank and Compressed Sensing

Pt. 4: Special Matrices

Pt. 5: Probability and Statistics

Pt. 6: Optimization

Pt. 7: Learning from Data