This is a simple view of PCA, an we’ll be able to compute it with nothing more than gradient descent with a few extra tricks for satisfying constraints. This is a typical machine learning problem: find a compressed representation of the data such that the reconstructions are as close to the original as possible. We will work from the outside in: we will view PCA first as a way of finding a smaller representation of a dataset. After years of almost, but not quite fully understanding PCA, here is my attempt to explain it fully, hopefully leaving some of the magic intact. Unfortunately, while it's always good to have a sense of wonder about mathematics, if a method seems too magical it usually means that there is something left to understand. Principal component analysis (PCA) is probably the most magical linear method in data science.