Notes on Linear Algebra & Learning from Data, Strang

note: in progress!

Pt. 1: Highlights of Linear Algebra

Basic problems studied in part 1:

1.1: Multiplication Ax Using Columns of A

We can interpret matrix multiplication by rows (traditional method) or columns:

(232437)(x1x2)=(2x1+3x22x1+4x23x1+7x2)\begin{pmatrix}2 & 3 \\2 & 4 \\3 & 7 \\\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 2x_1 + 3x_2 \\ 2x_1 + 4x_2 \\ 3x_1 + 7x_2 \end{pmatrix}

can be interpreted as the inner products of the rows of AA with x=(x1,x2)x = (x_1, x_2). On the other hand,

(232437)(x1x2)=x1(223)+x2(347)\begin{pmatrix} 2 & 3 \\ 2 & 4 \\ 3 & 7 \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = x_1 \begin{pmatrix} 2 \\ 2 \\ 3 \end{pmatrix} + x_2 \begin{pmatrix} 3 \\ 4 \\ 7 \end{pmatrix}

is a combination of the columns a1a_1 and a2a_2. Both ways give the same result. The first is computational but unhelpful for intuition. With the vector approach, we understand AxAx as a linear combination of the columns of AA. The combinations of the columns fill out the column space of AA.

Independence: from left to right on the columns of AA, we can construct a matrix CC by taking any nonzero columns from AA independent from the current columns in CC thus far and add them to CC. For a matrix AA with nn columns, we will end up with CC having rnr \leq n columns, forming a basis for the column space of AA. The number of columns rr in CC is the rank of CC and the dimension of the column space of AA and CC. Hence, the rank of a matrix is the dimension of its column space.

We can connect CC to AA with a third matrix RR s.t. A=CRA = CR, with shapes

(m×n)=(m×r)(r×n).(m \times n) = (m \times r) (r \times n).

For example,

A=(138126012)=(131201)(102012)=CR.A = \begin{pmatrix} 1&3&8\\ 1&2&6\\0&1&2 \end{pmatrix} = \begin{pmatrix} 1&3 \\ 1&2 \\ 0&1 \end{pmatrix} \begin{pmatrix} 1&0&2 \\ 0&1&2 \end{pmatrix} = CR.

Notice the multiplication from the column perspective: CC times the first column of RR is column 1 of AA, and similarly for the second column. When CC multiplies the third column of RR, we get 2(column 1) + 2(column 2), which is column 3 of AA.

In fact, R=rref(A)R = \text{rref}(A), the row-reduced echelon form of A (without zero rows).

Note also from the above note on the shapes of CC and RR, that the number of independent columns equals the number of independent rows: column rank = row rank.

SVD of AA is when the first factor CC has rr orthogonal columns and the second factor RR has rr orthogonal rows.

1.2: Matrix-Matrix Multiplication AB

Inner products produce each of the numbers in AB=CAB = C. For example, row 2 of AA and column 3 of BB give c23c_{23} in CC:

(...a21a22a23...)(..b13..b23..b33)=(.....c23...).\begin{pmatrix}.&.&.\\a_{21}&a_{22}&a_{23}\\.&.&.\end{pmatrix} \begin{pmatrix}.&.&b_{13}\\.&.&b_{23}\\.&.&b_{33}\end{pmatrix} = \begin{pmatrix}.&.&.\\.&.&c_{23}\\.&.&.\end{pmatrix}.

The other way to multiply ABAB is columns of AA times rows of BB.

Note that a column uu times a row vTv^T produces a matrix (of rank one, all columns being multiples of uu and all rows being multiples of vTv^T). This is called the outer product. The column space is the line in the direction of uu, and the row space is the line in the direction of vv.

The full product ABAB using columns of AA times rows of BB: let a1,...,ana_1, ..., a_n be the nn columns of AA. Then BB must have nn rows b1,...,bnb_1^*, ..., b_n^*. The product ABAB is the sum of columns aka_k times rows bkb^*_k, i.e., the sum of rank 1 matrices

AB=a1b1+a2b2+...+anbn.AB = a_1b^*_1 + a_2b^*_2 +... +a_nb^*_n.

The outer product approach is essential to data science because we want to break AA down into pieces (i.e., rank 1 matrices). A dominant them in applied linear algebra is to factor AA into CRCR and look at the pieces ckrkc_k r_k^* of A=CRA = CR. Factoring takes longer than multiplying, especially if the pieces involve eigenvalues or singular values.

Five important factorizations are:

  1. A=LUA = LU comes from elimination. Combinations of rows take AA to UU and UU back to AA. LL is lower triangular and UU is upper triangular.
  2. A=QRA = QR comes from orthogonalizing the columns of AA as in Gram-Schmidt. QQ has orthonormal columns (QTQ=IQ^T Q = I) and RR is upper triangular.
  3. S=QΛATS = Q \Lambda A^T comes from the eigenvalues of a symmetric matrix S=STS = S^T. Eigenvalues on the diagonal of Λ\Lambda. Orthonormal eigenvectors in the columns of QQ.
  4. A=XΛX1A = X \Lambda X^{-1} is diagonalization when AA is n×nn \times n with nn independent eigenvectors. Eigenvalues of AA on the diagonal of Λ\Lambda. Eigenvectors of AA in the columns of XX.
  5. A=UΣVTA = U \Sigma V^T is the Singular Value Decomposition of any matrix AA, with singular values in Σ\Sigma and orthonormal singular vectors in UU and VV.

1.3 The Four Fundamental Subspaces

Every m×nm \times n matrix AA leads to four subspaces: two subspaces of Rm\mathbb{R}^m and two more of Rn\mathbb{R}^n.

  1. Column space C(A)C(A), dim rr, subspace of Rm\mathbb{R}^m
  2. Row space C(AT)C(A^T), dim rr, subspace of Rn\mathbb{R}^n
  3. Nullspace N(A)N(A), dim nrn - r, subspace of Rn\mathbb{R}^n
  4. Left nullspace N(AT)N(A^T), dim mrm - r, subspace of Rm\mathbb{R}^m

The ranks of ABAB and A+BA+B:

  1. Rank of ABAB \leq rank of AA, rank of ABAB \leq rank of BB
  2. Rank of A+BA+B \leq (rank of AA) + (rank of BB)
  3. Rank of ATA=A^TA = rank of AAT=AA^T = rank of A=A = rank of ATA^T.
  4. If AA is (m×r)(m \times r) and BB is (r×n)(r \times n), both with rank rr, then ABAB has rank rr.

1.4 Elimination and A=LU

1.5 Orthogonal Matrices and Subspaces

1.6 Eigenvalues and Eigenvectors

1.7 Symmetric Positive Definite Matrices

1.8 Singular Values and Singular Vectors in SVD

1.9 Principal Components and the Best Low Rank Matrix

1.10 Rayleigh Quotients and Generalized Eigenvalues

1.11 Norms of Vectors and Functions and Matrices

1.12 Factoring Matrices and Tensors: Positive and Sparse

Pt. 2: Computations with Large Matrices

Pt. 3: Low Rank and Compressed Sensing

Pt. 4: Special Matrices

Pt. 5: Probability and Statistics

Pt. 6: Optimization

Pt. 7: Learning from Data