Google DeepMind's Gemma Models: Making Open Language Models at a Practical Size

In this talk, we introduce #Gemma, Google’s family of open #languagemodels, designed with a focus on practical size without sacrificing performance. We explore Gemma’s architecture and training methodology, highlighting techniques for efficient scaling and resource optimization. Kathleen Kenealy, Staff Research Engineer at‪@googledeepmind‬ and Technical Lead on the Gemma team, presents a comprehensive evaluation of Gemma across various benchmarks, demonstrating its competitive performance compared to larger open models while requiring significantly less #computational resources for both training and inference. Kathleen also discusses the implications of open-sourcing Gemma, fostering community-driven development and democratizing access to powerful language model technology. This work aims to bridge the gap between cutting-edge #LLM capabilities and practical constraints. Timestamps: 0:00 Introduction 0:58 Background behind the work 2:25 Gemma open models: a family of lightweight, state-of-art open models built from the same research and technology used to create the Gemini models 4:42 Developing Gemma: pre-training and post-training 6:17 Pretraining Life Cycle: data selection, compliance filtering, quality filtering, ablations & training 10:36 Knowledge Distillation 13:07 Pretraining challenges 15:16 Post-training Process: supervised fine-tuning, RLHF, Model Merging 20:22 Results 20:45 Why does openness matter? #artificialintelligence #artificialgeneralintelligence #ai #gemma #googledeepmind #google #llm #languagemodels #openlanguagemodel #machinelearning #ml #deeplearning #educationalvideos #science #technology #techtalk #techtalks Social Links: Newsletter: https://buzzrobot.substack.com/ X: https://x.com/sopharicks Slack: https://join.slack.com/t/buzzrobot/sh...