Research Scientist at Google DeepMind
trandustin@google.com
Blog
I am a senior staff research scientist at Google DeepMind. I lead evaluation for Gemini / Bard. I also drove workstreams enabling #1 on LMSYS via frontier evaluation and posttraining research.
My most notable works are in infrastructure (Mesh TensorFlow, Tensor2Tensor, TensorFlow Probability, Edward), modeling (Image Transformer, Automatic Differentiation Variational Inference), and evaluation (Plex, Uncertainty Baselines, Measuring Calibration).
I completed my Ph.D. at Columbia University advised by David Blei and Andrew Gelman.
Some talks:
Some of my work is available as preprints on arXiv.
Gemini: A Family of Highly Capable Multimodal Models
Larger language models do in-context learning
differently
Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson,
Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou,
Tengyu Ma
Plex: Towards reliability using pretrained large
model extensions
Dustin Tran, Jeremiah Liu, Michael W.
Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi
Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner,
Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas
Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan,
Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal,
Zoubin Ghahramani, Jasper Snoek, Balaji
Lakshminarayanan
Uncertainty Baselines: Benchmarks for Uncertainty
& Robustness in Deep Learning
Zachary Nado, Neil Band, Mark Collier, Josip Djolonga,
Michael W. Dusenberry, Sebastian Farquhar, Angelos Filos,
Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah
Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren,
Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin
Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper
Snoek, Yarin Gal, Dustin Tran
On the discrepancy between density estimation and
sequence generation
Jason Lee, Dustin Tran, Orhan Firat,
Kyunghyun Cho
Measuring calibration in deep learning
Jeremy Nixon, Michael Dusenberry, Linchuan Zhang, Ghassen
Jerfel, Dustin Tran
NeuTra-lizing bad geometry in Hamiltonian Monte
Carlo using neural transport
Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian
Langmore, Dustin Tran, Srinivas Vasudevan
TensorFlow Distributions
Joshua V. Dillon, Ian Langmore, Dustin
Tran, Eugene Brevdo, Srinivas Vasudevan, Dave
Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A.
Saurous
Edward: A library for probabilistic modeling,
inference, and criticism
Dustin Tran, Alp Kucukelbir, Adji B. Dieng,
Maja Rudolph, Dawen Liang, David M. Blei
Model criticism for Bayesian causal inference
Dustin Tran, Francisco J. R. Ruiz, Susan
Athey, David M. Blei
Stochastic gradient descent methods for estimation with
large data sets
Dustin Tran, Panos Toulis, Edoardo M.
Airoldi
Scaling vision transformers to 22 billion parameters
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr
Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner,
Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin,
Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag
Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer,
Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van
Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran,
Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings,
Mark Patrick Collier, Alexey Gritsenko, Vighnesh
Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink,
Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas
Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah
Harmsen, Neil Houlsby
International Conference on Machine Learning, 2023
A simple zero-shot prompt weighting technique to
improve prompt ensembling in text-image
models
James Urquhart Allingham, Jie Ren, Michael W Dusenberry,
Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji
Lakshminarayanan
International Conference on Machine Learning, 2023
A brief tour of deep learning from a statistical
perspective
Eric Nalisnick, Padhraic Smyth, Dustin Tran
Annual Review of Statistics and Its Application, 2023
Simple and principled uncertainty estimation with
deterministic deep learning via distance
awareness
Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin
Tran, Tania Bedrax-Weiss, Balaji
Lakshminarayanan
Journal of Machine Learning Research, 2022
Sparse MoEs meet efficient ensembles
James Urquhart Allingham, Florian Wenzel, Zelda E Mariet,
Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen
Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper
Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe
Jenatton
Transactions on Machine Learning Research, 2022
Deep classifiers with label noise modeling and
distance awareness
Vincent Fortuin, Mark Collier, Florian Wenzel, James
Allingham, Jeremiah Liu, Dustin Tran,
Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton,
Effrosyni Kokiopoulou
Transactions on Machine Learning Research, 2022
Revisiting the calibration of modern neural
networks
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances
Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario
Lucic
Neural Information Processing Systems, 2021
Benchmarking Bayesian deep learning on diabetic
retinopathy detection tasks
Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos,
Zachary Nado, Michael W Dusenberry, Ghassen Jerfel,
Dustin Tran, Yarin Gal
Neural Information Processing Systems, 2021
Soft calibration objectives for neural networks
Archit Karandikar, Nicholas Cain, Dustin
Tran, Balaji Lakshminarayanan, Jonathon Shlens,
Michael C. Mozer, Becca Roelofs
Neural Information Processing Systems, 2021
Sampling the variational posterior with local
refinement
Marton Havasi, Jasper Snoek, Dustin Tran,
Jonathan Gordon, José Miguel Hernández-Lobato
Entropy, 2021
Combining ensembles and data augmentation can harm
your calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W.
Dusenberry, Jasper Snoek, Balaji Lakshminarayanan,
Dustin Tran
International Conference on Learning Representations, 2021
Training independent subnetworks for robust prediction
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah
Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M.
Dai, Dustin Tran
International Conference on Learning Representations, 2021
Hyperparameter ensembles for robustness and
uncertainty quantification
Integrate over both weights and hyperparameters!
Florian Wenzel, Jasper Snoek, Dustin
Tran, Rodolphe Jenatton
Neural Information Processing Systems, 2020
Simple and principled uncertainty estimation with
deterministic deep learning via distance
awareness
Leverage spectral normalization and Gaussian processes.
Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin
Tran, Tania Bedrax-Weiss, Balaji
Lakshminarayanan
Neural Information Processing Systems, 2020
Demonstrating principled uncertainty modeling for
recommender ecosystems with RecSim NG
A platform for simulating multi-agent recommender systems
using probabilistic programming.
Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie,
Christopher Colby, Nicolas Mayoraz, Hubert Pham,
Dustin Tran, Ivan Vendrov, Craig
Boutilier
RecSys, 2020
Efficient and scalable Bayesian neural nets with
rank-1 factors
Mixture posteriors, Cauchy priors, rank-1
parameterization.
Michael Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-an Ma,
Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan,
Dustin Tran
International Conference on Machine Learning, 2020
BatchEnsemble: An alternative approach to
efficient ensemble and lifelong learning
Efficient ensembles for uncertainty and lifelong learning.
Yeming Wen, Dustin Tran, Jimmy Ba
International Conference on Learning Representations, 2020
Analyzing the role of model uncertainty in
electronic health records
Where parameter uncertainty affects clinical decision-making.
Michael Dusenberry, Dustin Tran, Edward
Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine
Heller, Andrew Dai
ACM Conference on Health, Inference, and Learning, 2020
Expectation propagation as a way of life: A
framework for Bayesian inference on partitioned
data
How to distribute inference with massive data sets and how
to combine inferences from many data sets.
Andrew Gelman, Aki Vehtari, Pasi Jylänki, Tuomas Sivula,
Dustin Tran, Swupnil Sahai, Paul
Blomstedt, John P. Cunningham, David Schiminovich,
Christian Robert
Journal of Machine Learning Research, 21(17):1–53, 2020
Bayesian Layers: A module for neural network uncertainty
A neural net-stylized primitive for distributions over functions.
Dustin Tran, Michael Dusenberry, Mark van
der Wilk, Danijar Hafner
Neural Information Processing Systems, 2019
Discrete flows: Invertible generative models for
discrete data
How to model with discrete invertible functions.
Dustin Tran, Keyon Vafa, Kumar Krishna
Agrawal, Laurent Dinh, Ben Poole
Neural Information Processing Systems, 2019
Noise contrastive priors for functional uncertainty
A prior for neural networks in data space.
Danijar Hafner, Dustin Tran, Alex Irpan,
Timothy Lillicrap, James Davidson
Uncertainty in Artificial Intelligence, 2019
Simple, distributed, and accelerated probabilistic
programming
Probabilistic programs on TPUs.
Dustin Tran, Matthew D. Hoffman, Dave
Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul,
Matthew Johnson, Rif A. Saurous
Neural Information Processing Systems, 2018
Autoconj: Recognizing and exploiting conjugacy
without a domain-specific language
The autointegrate analog of autodiff.
Matthew D. Hoffman, Matthew Johnson, Dustin Tran
Neural Information Processing Systems, 2018
Mesh-TensorFlow: Deep learning for
supercomputers
Model parallelism made easier.
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin
Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins,
HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake
Hechtman
Neural Information Processing Systems, 2018
Image Transformer
An image autoregressive model using only attention.
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam
Shazeer, Alexander Ku, Dustin Tran
International Conference on Machine Learning, 2018
Implicit causal models for genome-wide association
studies
Generative models applied to causality in genomics.
Dustin Tran, David M. Blei
International Conference on Learning Representations, 2018
Flipout: Efficient pseudo-independent weight perturbations
on mini-batches
How to make weight perturbations in evolution strategies and
variational BNNs as mini-batch-friendly as activation perturbations
in dropout and batch norm.
Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran,
Roger Grosse
International Conference on Learning Representations, 2018
Hierarchical implicit models and likelihood-free
variational inference
Combining the idea of implicit densities with hierarchical Bayesian
modeling and deep neural networks.
Dustin Tran, Rajesh Ranganath, David M.
Blei
Neural Information Processing Systems, 2017
Variational inference via $\chi$-upper bound
minimization
Overdispersed approximations and upper bounding
the model evidence.
Adji B. Dieng, Dustin Tran, Rajesh
Ranganath, John Paisley, David M. Blei
Neural Information Processing Systems, 2017
Comment, "Fast approximate inference for
arbitrarily large semiparametric regression models via
message passing"
The role of message passing in automated inference.
Dustin Tran, David M. Blei
Journal of the American Statistical Association,
112(517):156–158, 2017
Automatic differentiation variational inference
An automated tool for black box variational inference,
available in Stan.
Alp Kucukelbir, Dustin Tran, Rajesh Ranganath,
Andrew Gelman, David M. Blei
Journal of Machine Learning Research, 18(14):1–45, 2017
Deep probabilistic programming
How to build a language with rich compositionality for
modeling and inference.
Dustin Tran, Matthew D. Hoffman, Rif A.
Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei
International Conference on Learning Representations, 2017
Operator variational inference
How to formalize computational and statistical tradeoffs in variational inference.
Rajesh Ranganath, Jaan Altosaar, Dustin Tran, and David M.
Blei
Neural Information Processing Systems, 2016
Hierarchical variational models
A Bayesian formalism for constructing expressive
variational families.
Rajesh Ranganath, Dustin Tran, David M.
Blei
International Conference on Machine Learning, 2016
Spectral M-estimation with application to hidden
Markov models
Applying M-estimation for sample efficiency and robustness
in moment-based estimators.
Dustin Tran, Minjae Kim, Finale Doshi-Velez
Artificial Intelligence and Statistics, 2016
Towards stability and optimality in stochastic gradient
descent
A stochastic gradient method combining numerical stability
and statistical efficiency.
Panos Toulis, Dustin Tran, Edoardo M.
Airoldi
Artificial Intelligence and Statistics, 2016
The variational Gaussian process
A powerful variational model that can universally
approximate any posterior.
Dustin Tran, Rajesh Ranganath, David M.
Blei
International Conference on Learning Representations, 2016
Copula variational inference
Posterior approximations using copulas, which find
meaningful dependence between latent variables.
Dustin Tran, David M. Blei, Edoardo M.
Airoldi
Neural Information Processing Systems, 2015