Cramming : Training a language model on a single gpu in one day research paper explanation
This is a short video which explains the cramming technique and is based on the research paper Training a language model on a single gpu in one day
by Jonas Geiping andTom Goldstein.