Network-Accelerated Distributed Deep Learning - ACM SIGCOMM 2021 TUTORIAL

Network-Accelerated Distributed Deep Learning - ACM SIGCOMM 2021 TUTORIAL

Distributed Deep Learning (DL) is an important workload that is becoming communication bound as the scale and size of DL models increases. This ACM SIGCOMM 2021 tutorial presents a range of techniques that are effective at mitigating the network communication bottleneck and accelerate the performance of distributed training.