Working on Optimizers and Model Architecture Scaling Schemes That Would Allow Predictable and Efficient Training of Transformer Models Containing Hundreds of Billions of Parameters

1 faculty research Working on Optimizers and Model Architecture Scaling Schemes That Would Allow Predictable and Efficient Training of Transformer Models Containing Hundreds of Billions of Parameters on GradNova.