StatQuest: t-SNE, Clearly Explained

StatQuest: t-SNE, Clearly Explained

t-SNE is a popular method for making an easy to read graph from a complex dataset, but not many people know how it works. Here's the inside scoop. Here’s how to create a t-SNE graph in R (this is copied from the help file for Rtsne)… library("Rtsne") iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want reproducible results tsne_out <- Rtsne(iris_matrix) # Run TSNE Show the objects in the 2D tsne representation plot(tsne_out$Y,col=iris_unique$Species) This StatQuest is based on the original t-SNE manuscript, and it's not super hard to read (especially if you understand the general idea of how it works): https://lvdmaaten.github.io/publicati... For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon:   / statquest   ...or... YouTube Membership:    / @statquest   ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:   / joshuastarmer   0:00 Awesome song and introduction 1:19 Overview of what t-SNE does 2:24 Overview of how t-SNE works 4:12 Step 1: Determine high-dimensional similarities 9:26 Step 2: Determine low-dimensional similarities 10:33 Step 3: Move points in low-d 11:05 Why the t-distribution is used instead of the normal distribution Corrections: 6:17 I should have said that the blue points have twice the density of the purple points. 7:08 There should be a 0.05 in the denominator, not a 0.5. #statquest #tsne