Dustin Tran

Member of Technical Staff at xAI
dustintran@x.ai
Blog

Update: I joined xAI, where I lead the post-training team. I was the model captain for Grok 4.1, a new RLHF & agentic post-training recipe building on Grok 4. In just a few months, we climbed from nothing to #2-3 LMArena, #1 Search Arena, & other tool use benchmarks.

I am a senior staff research scientist at Google DeepMind. I am the co-creator of Gemini-0801, Google's first #1 on LMSYS. I'm also the eval expert behind 2.5 models such as those attaining #1 on WebDev Arena and HLE. More broadly, I am a core contributor to Gemini 1, 1.5, 2, and 2.5 working across fundamentals in RL, evals, and data. I co-lead their papers and results.

Prior to Gemini, my most notable works are in infrastructure (Mesh TensorFlow, Tensor2Tensor, TensorFlow Probability, Edward), modeling (Image Transformer, Vision Transformer), and evaluation (Uncertainty Baselines, Measuring Calibration). I completed my Ph.D. at Columbia University advised by David Blei and Andrew Gelman.

Curriculum Vitae

Publications

Some of my work is available as preprints on arXiv.

2025

Grok 4.1
xAI team

Blog

Grok 4.1 Fast
xAI team

Blog

Grok 4 Fast
xAI team

Blog

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini team

Paper Blog

Gemma 3 Technical Report
Gemma team

Paper

2024

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini team

Paper

Long-form factuality in large language models
Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Zixia Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc Le
Neural Information Processing Systems, 2024

Paper

2023

Gemini: A Family of Highly Capable Multimodal Models
Gemini team

Paper

Larger language models do in-context learning differently
Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

Paper

Scaling vision transformers to 22 billion parameters
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
International Conference on Machine Learning, 2023

Paper

A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models
James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji Lakshminarayanan
International Conference on Machine Learning, 2023

Paper

A brief tour of deep learning from a statistical perspective
Eric Nalisnick, Padhraic Smyth, Dustin Tran
Annual Review of Statistics and Its Application, 2023

Paper

2022

Plex: Towards reliability using pretrained large model extensions
Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek, Balaji Lakshminarayanan

Paper Blog Code

Simple and principled uncertainty estimation with deterministic deep learning via distance awareness
Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan
Journal of Machine Learning Research, 2022

Paper Code

Sparse MoEs meet efficient ensembles
James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton
Transactions on Machine Learning Research, 2022

Paper

Deep classifiers with label noise modeling and distance awareness
Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
Transactions on Machine Learning Research, 2022

Paper

2021

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning
Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, Dustin Tran

Paper Blog Code

Revisiting the calibration of modern neural networks
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic
Neural Information Processing Systems, 2021

Paper Code Video

Benchmarking Bayesian deep learning on diabetic retinopathy detection tasks
Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Michael W Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal
Neural Information Processing Systems, 2021

Paper

Soft calibration objectives for neural networks
Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs
Neural Information Processing Systems, 2021

Paper Video

Sampling the variational posterior with local refinement
Marton Havasi, Jasper Snoek, Dustin Tran, Jonathan Gordon, José Miguel Hernández-Lobato
Entropy, 2021

Paper

Combining ensembles and data augmentation can harm your calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran
International Conference on Learning Representations, 2021

Paper Code

Training independent subnetworks for robust prediction
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran
International Conference on Learning Representations, 2021

Paper Code

2020

Hyperparameter ensembles for robustness and uncertainty quantification
Integrate over both weights and hyperparameters!
Florian Wenzel, Jasper Snoek, Dustin Tran, Rodolphe Jenatton
Neural Information Processing Systems, 2020

Paper Code

Simple and principled uncertainty estimation with deterministic deep learning via distance awareness
Leverage spectral normalization and Gaussian processes.
Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan
Neural Information Processing Systems, 2020

Paper Code

On the discrepancy between density estimation and sequence generation
Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho

Paper

Demonstrating principled uncertainty modeling for recommender ecosystems with RecSim NG
A platform for simulating multi-agent recommender systems using probabilistic programming.
Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier
RecSys, 2020

Paper Blog Code

Efficient and scalable Bayesian neural nets with rank-1 factors
Mixture posteriors, Cauchy priors, rank-1 parameterization.
Michael Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-an Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran
International Conference on Machine Learning, 2020

Paper Code Video

BatchEnsemble: An alternative approach to efficient ensemble and lifelong learning
Efficient ensembles for uncertainty and lifelong learning.
Yeming Wen, Dustin Tran, Jimmy Ba
International Conference on Learning Representations, 2020

Paper Code Video

Analyzing the role of model uncertainty in electronic health records
Where parameter uncertainty affects clinical decision-making.
Michael Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew Dai
ACM Conference on Health, Inference, and Learning, 2020

Paper

Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data
How to distribute inference with massive data sets and how to combine inferences from many data sets.
Andrew Gelman, Aki Vehtari, Pasi Jylänki, Tuomas Sivula, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, Christian Robert
Journal of Machine Learning Research, 21(17):1–53, 2020

Paper

2019

Measuring calibration in deep learning
Jeremy Nixon, Michael Dusenberry, Linchuan Zhang, Ghassen Jerfel, Dustin Tran

Paper

NeuTra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport
Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian Langmore, Dustin Tran, Srinivas Vasudevan

Paper

Bayesian Layers: A module for neural network uncertainty
A neural net-stylized primitive for distributions over functions.
Dustin Tran, Michael Dusenberry, Mark van der Wilk, Danijar Hafner
Neural Information Processing Systems, 2019

Paper Poster Code

Discrete flows: Invertible generative models for discrete data
How to model with discrete invertible functions.
Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, Ben Poole
Neural Information Processing Systems, 2019

Paper Poster Code

Noise contrastive priors for functional uncertainty
A prior for neural networks in data space.
Danijar Hafner, Dustin Tran, Alex Irpan, Timothy Lillicrap, James Davidson
Uncertainty in Artificial Intelligence, 2019

Paper Poster Code

2018

Simple, distributed, and accelerated probabilistic programming
Probabilistic programs on TPUs.
Dustin Tran, Matthew D. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous
Neural Information Processing Systems, 2018

Paper Poster Code

Autoconj: Recognizing and exploiting conjugacy without a domain-specific language
The autointegrate analog of autodiff.
Matthew D. Hoffman, Matthew Johnson, Dustin Tran
Neural Information Processing Systems, 2018

Paper Poster Code

Mesh-TensorFlow: Deep learning for supercomputers
Model parallelism made easier.
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman
Neural Information Processing Systems, 2018

Paper Poster Code

Image Transformer
An image autoregressive model using only attention.
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
International Conference on Machine Learning, 2018

Paper Poster Code

Implicit causal models for genome-wide association studies
Generative models applied to causality in genomics.
Dustin Tran, David M. Blei
International Conference on Learning Representations, 2018

Paper Poster Video Slides

Flipout: Efficient pseudo-independent weight perturbations on mini-batches
How to make weight perturbations in evolution strategies and variational BNNs as mini-batch-friendly as activation perturbations in dropout and batch norm.
Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse
International Conference on Learning Representations, 2018

Paper Code

2017

TensorFlow Distributions
Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous

Paper Poster Code

Hierarchical implicit models and likelihood-free variational inference
Combining the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks.
Dustin Tran, Rajesh Ranganath, David M. Blei
Neural Information Processing Systems, 2017

Paper Poster Blog Article

Variational inference via $\chi$-upper bound minimization
Overdispersed approximations and upper bounding the model evidence.
Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei
Neural Information Processing Systems, 2017

Paper Code

Comment, "Fast approximate inference for arbitrarily large semiparametric regression models via message passing"
The role of message passing in automated inference.
Dustin Tran, David M. Blei
Journal of the American Statistical Association, 112(517):156–158, 2017

Paper Blog Article

Automatic differentiation variational inference
An automated tool for black box variational inference, available in Stan.
Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei
Journal of Machine Learning Research, 18(14):1–45, 2017

Paper Code Slides

Deep probabilistic programming
How to build a language with rich compositionality for modeling and inference.
Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei
International Conference on Learning Representations, 2017

Paper Website Poster Slides

2016

Edward: A library for probabilistic modeling, inference, and criticism
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei

Paper Website Slides

Model criticism for Bayesian causal inference
Dustin Tran, Francisco J. R. Ruiz, Susan Athey, David M. Blei

Paper

Operator variational inference
How to formalize computational and statistical tradeoffs in variational inference.
Rajesh Ranganath, Jaan Altosaar, Dustin Tran, and David M. Blei
Neural Information Processing Systems, 2016

Paper Poster

Hierarchical variational models
A Bayesian formalism for constructing expressive variational families.
Rajesh Ranganath, Dustin Tran, David M. Blei
International Conference on Machine Learning, 2016

Paper Poster

Spectral M-estimation with application to hidden Markov models
Applying M-estimation for sample efficiency and robustness in moment-based estimators.
Dustin Tran, Minjae Kim, Finale Doshi-Velez
Artificial Intelligence and Statistics, 2016

Paper

Towards stability and optimality in stochastic gradient descent
A stochastic gradient method combining numerical stability and statistical efficiency.
Panos Toulis, Dustin Tran, Edoardo M. Airoldi
Artificial Intelligence and Statistics, 2016

Paper Poster Code

The variational Gaussian process
A powerful variational model that can universally approximate any posterior.
Dustin Tran, Rajesh Ranganath, David M. Blei
International Conference on Learning Representations, 2016

Paper Slides

2015

Copula variational inference
Posterior approximations using copulas, which find meaningful dependence between latent variables.
Dustin Tran, David M. Blei, Edoardo M. Airoldi
Neural Information Processing Systems, 2015

Paper Poster

Stochastic gradient descent methods for estimation with large data sets
Dustin Tran, Panos Toulis, Edoardo M. Airoldi

Paper Code