Working on Optimizers and Model Architecture Scaling Schemes That Would Allow Predictable and Efficient Training of Transformer Models Containing Hundreds of Billions of Parameters
1 faculty research Working on Optimizers and Model Architecture Scaling Schemes That Would Allow Predictable and Efficient Training of Transformer Models Containing Hundreds of Billions of Parameters on GradNova.
I