Statistics deals with finding order out of chaos. Oftentimes we find ourselves with more data than we know what to do with it. With more variables, more information. However, with more variables, there is also exponentially more complexity. For example, calculating covariance increases exponentially in complexity with each extra variable. In hyperspectral analysis, where scientists deal with datasets of hundreds of variables, this becomes unfeasable to compute. This is called the curse of dimensionality. So, they use techniques for dimensionality reduction to reduce the complexity of calculating covariance. That is, they boil down the variables down to just a few made up of a mix of all the rest. Think of it simply as the act of describing something mathematically. The techniques for achieving this fall under the umbrella of what is termed Spectral Theory. Spectral Theory applies eigenvector theory to reduce the dimensionality of a square matrix— be it a dataset or a set of functions.
Spectral Theory
In mathematics, spectral theory is an inclusive term for theories extending the eigenvector and eigenvalue theory of a single square matrix to a much broader theory of the structure of operators in a variety of mathematical spaces.
Principal Component Analysis
From Fritz’ Blog
Principal component analysis (PCA) is an algorithm that uses a statistical procedure to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. […] This is one of the primary methods for performing dimensionality reduction — this reduction ensures that the new dimension maintains the original variance in the data as best it can. […] That way we can visualize high-dimensional data in 2D or 3D, or use it in a machine learning algorithm for faster training and inference.
They also lay out the steps for performing PCA.
- Standardize (or normalize) the data.
- Calculate the covariance matrix from this standardized data (with dimension d).
- Obtain the Eigenvectors and Eigenvalues from the newly-calculated covariance matrix.
- Sort the Eigenvalues in descending order, and choose the 𑘠Eigenvectors that correspond to the 𑘠largest Eigenvalues — where 𑘠is the number of dimensions in the new feature subspace (ð‘˜â‰¤ð‘‘).
- Construct the projection matrix 𑊠from the 𑘠selected Eigenvectors.
- Transform the original dataset ð‘‹ by simple multiplication in 𑊠to obtain a ð‘˜-dimensional feature subspace ð‘Œ.
- (optional) Calculate the explained variance: how much variance is captured by the PCA algorithm. Higher value = better.
**
SVD of a 2x2 matrix photo:centerspace
PCA has its own limitations. Mainly, that it involves multiplying all of the samples with each other (which is very cursed) and it doesn’t work so well with non-linear correlations. Luckily, you can always apply your own kernel methods for translating polynomial relationships down to linear problems, but that’s a horror story for another time. There is a more general approach for dimensionality reduction which is also more computationally efficient called Singular value Decomposition, which I’ll write about later. For a step-by-step implementation of PCA in python, check out Nikita Kozodoi’s.