8. Dimension Reduction Algorithms

8.1. What is dimension reduction?

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, via obtaining a set “uncorrelated” principle variables. It can be divided into feature selection and feature extraction. https://en.wikipedia.org/wiki/Dimensionality_reduction

8.2. Singular Value Decomposition (SVD)

At here, I will recall the three types of the SVD method, since some authors confused the definitions of these SVD method. SVD method is important for the the dimension reduction algorithms, such as Truncated Singular Value Decomposition (tSVD) can be used to do the dimension reduction directly, and the Full Rank Singular Value Decomposition (SVD) can be applied to do Principal Component Analysis (PCA), since PCA is a specific case of SVD.

  1. Full Rank Singular Value Decomposition (SVD)

Suppose {\bf X}\in\mathbb{R}^{n\times p}, (p<n), then

\underbracket{\bf X}_{n\times p} =\underbracket{\bf U}_{n\times n} \underbracket{\bf\Sigma}_{n\times p} \underbracket{{\bf V}^T}_{p\times p},

is called a full rank SVD of {\bf X} and

  • \sigma_i– Sigular calues and {\bf\Sigma}=diag(\sigma_1,\sigma_2, \cdots, \sigma_p)\in \mathbb{R}^{n\times p}
  • u_i– left singular vectors, {\bf U}=[u_1,u_2, \cdots, u_n] and {\bf U} is unitary.
  • v_i– right singular vectors, {\bf V}=[v_1,v_2, \cdots, v_p] and {\bf V} is unitary.
_images/svd.png

Singular Value Decomposition

  1. Reduced Singular Value Decomposition (rSVD)

Suppose {\bf X}\in\mathbb{R}^{n\times p},(n<p), then

\underbracket{\bf X}_{n\times p} =\underbracket{\bf \hat{U}}_{n\times p} \underbracket{\bf\hat{\Sigma}}_{p\times p} \underbracket{{\bf \hat{V}}^T}_{p\times p},

is called a Reduced Singular Value Decomposition rSVD of {\bX} and

  • \sigma_i– Sigular calues and {\bf\hat{\Sigma}}=diag(\sigma_1,\sigma_2, \cdots, \sigma_p)\in \mathbb{R}^{p\times p}
  • u_i– left singular vectors, {\bf \hat{U}}=[u_1,u_2, \cdots, u_p] is column-orthonormal matrix.
  • v_i– right singular vectors, {\bf \hat{V}}=[v_1,v_2, \cdots, v_p] is column-orthonormal matrix.
  1. Truncated Singular Value Decomposition (tSVD)

Suppose {\bf X}\in\mathbb{R}^{n\times p},(r<p), then

(1)\underbracket{\bf X}_{n\times p} =\underbracket{\bf \hat{U}}_{n\times r} \underbracket{\bf\hat{\Sigma}}_{r\times r} \underbracket{{\bf \hat{V}}^T}_{r\times p},

is called a Truncated Singular Value Decomposition tSVD of {\bf X} and

  • \sigma_i– Sigular calues and {\bf\hat{\Sigma}}=diag(\sigma_1,\sigma_2, \cdots, \sigma_r)\in \mathbb{R}^{r\times r}
  • u_i– left singular vectors, {\bf \hat{U}}=[u_1,u_2, \cdots, u_r] is column-orthonormal matrix.
  • v_i– right singular vectors, {\bf \hat{V}}=[v_1,v_2, \cdots, v_p] is column-orthonormal matrix.
_images/tsvd.png

Truncated Singular Value Decomposition

Figure Truncated Singular Value Decomposition indictes that the the dimension of {\bf \hat{U}} is smaller than {\bf X}. We can use this property to do the dimension reduction. But, usually, we will use SVD to compute the Principal Components. We will learn more details in next section.

8.3. Principal Component Analysis (PCA)

Usually, there are two ways to implement the PCA. Principal Component Analysis (PCA) is a specific case of SVD.

(2)\underbracket{\bX}_{n\times p} =\hU

8.4. Independent Component Analysis (ICA)

8.5. Nonnegative matrix factorization (NMF)

TO DO……