Distributed Training Framework

PyTorch Horovod NCCL DeepSpeed

A scalable distributed training framework that supports both data and model parallelism. Implements various optimization techniques for training large models efficiently.

Features

Back View on GitHub