Cramming: Training a Language model on a single GPU in one day

Cramming: Training a Language model on a single GPU in one day

In this video, we cover Cramming which discusses how much performance we can get by training a BERT model for one day and how that compares to the results of the original BERT implementation. The paper also discusses a lot of tricks to achieve the same 00:00 Introductions 04:00 Cramming Links for Cramming: Paper: https://arxiv.org/pdf/2212.14034.pdf Code: https://github.com/JonasGeiping/cramming Annotated paper used in video: https://drive.google.com/file/d/1ibwv... We cover new papers each week. Join our group on discord:   / discord  . #Cramming #Ianguagemodel #buckformoney #naturallanguageprocessing #nlp #deeplearning #ai