Welcome to RS_c, the central platform for the RecSys community. We provide curated lists of recommender-systems datasets, algorithms, books, conferences and many resources more. Maybe most importantly, we publish the latest recommender-system news. If you want your news to be reported on RS_c, read here.
HetSeq: Training BERT on a random assortment of GPUs [Yifan Ding et al.]
December 15, 2020
BERT has brought huge changes to how NLP is done, and also had a notable impact on recommender systems (not always though*). However, training BERT may take weeks, if not months. Yifan Ding, Nicholas Botzer, and Tim Weninger promise to have found a solution for those being e.g. at universities with heterogeneous GPU infrastructure.
Unfortunately, most organizations not named Google or Microsoft do not have a thousand identical GPUs. Instead, small and medium organizations have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be easily adapted to compute large models. Under these circumstances training even moderately-sized models could take weeks or even months to complete.
To help remedy this situation we recently released a software package called HetSeq, which is adapted from the popular PyTorch package and provides the capability to train large neural network models on heterogeneous infrastructure.
Experiments, details of which can be found in an article (available on ArXiV) published at the 2021 AAAI/IAAI Conference, show that base-BERT can be trained in about a day over 8 different GPU systems, most of which we had to “borrow” from idle labs from across Notre Dame.