HetSeq: Training BERT on a random assortment of GPUs [Yifan Ding et al.]

December 15, 2020

BERT has brought huge changes to how NLP is done, and also had a notable impact on recommender systems (not always though*). However, training BERT may take weeks, if not months. Yifan Ding, Nicholas Botzer, and Tim Weninger promise to have found a solution for those being e.g. at universities with heterogeneous GPU infrastructure.

Unfortunately, most organizations not named Google or Microsoft do not have a thousand identical GPUs. Instead, small and medium organizations have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be easily adapted to compute large models. Under these circumstances training even moderately-sized models could take weeks or even months to complete.
To help remedy this situation we recently released a software package called HetSeq, which is adapted from the popular PyTorch package and provides the capability to train large neural network models on heterogeneous infrastructure.
Experiments, details of which can be found in an article (available on ArXiV) published at the 2021 AAAI/IAAI Conference, show that base-BERT can be trained in about a day over 8 different GPU systems, most of which we had to “borrow” from idle labs from across Notre Dame.
https://towardsdatascience.com/training-bert-at-a-university-eedcf940c754

https://towardsdatascience.com/training-bert-at-a-university-eedcf940c754

* In some recent work we showed that language models like BERT do not always perform very well

Hassan, Hebatallah A. Mohamed, Giuseppe Sansonetti, Fabio Gasparetti, Alessandro Micarelli, and Joeran Beel. “BERT, ELMo, USE and InferSent Sentence Encoders: The Panacea for Research-Paper Recommendation?” In 13th ACM Conference on Recommender Systems (RecSys), 2019.

Tags:BERT, Distributed GPUs, HetSeq, Nicholas Botzer, Tim Weninger, Yifan Ding

About The Author

Joeran Beel

I am the founder of Recommender-Systems.com and head of the Intelligent Systems Group (ISG) at the University of Siegen, Germany https://isg.beel.org. We conduct research in recommender-systems (RecSys), personalization and information retrieval (IR) as well as on automated machine learning (AutoML), meta-learning and algorithm selection. Domains we are particularly interested in include smart places, eHealth, manufacturing (industry 4.0), mobility, visual computing, and digital libraries. We founded or maintain, among others, LensKit-Auto, Darwin & Goliath, Mr. DLib, and Docear, each with thousand of users; we contributed to TensorFlow, JabRef and others; and we developed the first prototypes of automated recommender systems (AutoSurprise and Auto-CaseRec) and Federated Meta Learning (FMLearn Server and Client).

Related Posts

About The Author

Joeran Beel

Add a Comment