Principal Component Analysis (PCA)

This a dimensionality reduction technique (or algorithm).

In order to proceed, let's define dimensionality reduction:

Wikipedia:

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.

So what does that ultimately mean? It means that our feature space is too big. Is there a way we combine the columns so that we get new features that best represent the differences between data points? Yes ! There are many - and PCA is just one of them.

Approach

We'd like to re-express some dataset into a smaller number of dimensions, which are called the principal components. When we re-express the data, PCA focuses on the dimensions which maximize the variance - this means that the most important differences will become easier to see.

Resources

Jon Shlens excellent tutorial here.

results matching ""

    No results matching ""