Cramming : Training a language model on a single gpu in one day research paper explanation

Cramming : Training a language model on a single gpu in one day research paper explanation

This is a short video which explains the cramming technique and is based on the research paper Training a language model on a single gpu in one day by Jonas Geiping andTom Goldstein.