In recent years, Deep Neural Networks (DNNs) rapidly play an important role in modern computing applications for research as well as various aspect of life. The community puts a significant effort to optimize and scale deep learning (DL) frameworks to speed-up the DL design process, especially, training large DNNs on High Performance Computing (HPC) systems. With the steady increase in datasets size and DNNs model size, the common approach, i.e., data parallelism, training a specific DL model on an HPC system faces substantial scalability challenges. This includes (1) maintaining the application accuracy, (2) high communication overhead between computing elements such as CPUs, GPUs and memories, and (3) a lack of memory capacity and bandwidth to store bigger datasets and DNNs models in one computing element. In this project, we aim to enable training DNNs on large-scale distributed HPC systems in the magnitude of hours. This will be achieved by the integration of the following: a) a usage of model/hybrid parallelism approaches to bypass the memory limits, b) an architecture-aware approximate communication design to reduce the communication overhead, and c) an HPC systems design optimized for DL traffic pattern to increase network the bandwidth and reduce network latency.
✅ Speaker Bio:
Truong Thao Nguyen received the BE and ME degrees from Hanoi University of Science and Technology, Hanoi, Vietnam, in 2011 and 2014, respectively. He received the Ph.D. in Informatics from the Graduate University for Advanced Studies, Japan in 2018. He is currently working at AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), where he currently focuses on the topics of communication for HPC systems and beyond.
Record: MS Teams Record