I
Research Interests
Computational Techniques for Learning From Data That Are Highly Scalable With the Availability of the Compute
Working on Optimizers and Model Architecture Scaling Schemes That Would Allow Predictable and Efficient Training of Transformer Models Containing Hundreds of Billions of Parameters