As you may (or may not) know, I’ve been busy lately spear-heading Edward, an open-source library for probabilistic modeling. It’s meant to help bridge the gap between what I view as two dichotomous approaches of statistical learning: one approach develops complex models in order to achieve the best results on a specific task; and the other approach adheres to simple models in order to understand every component of the analysis both empirically and theoretically. The former starts at the end-goal; the latter starts at the foundation.
There are many names you can append to one of these approaches, and certainly those names imply contrasting motivations due to culture, and thus contrasting views and contrasting applications. But ultimately, the goal is still the same. The approaches are not orthogonal, but a lack of awareness connecting the two make them seem to be. A neural network is a powerful aproach for modeling non-linear functions (I mean this not tongue-in-cheek; it’s difficult to summarize many decades of innovation in a sentence). A Bayesian linear model is a powerful approach for incorporating parameter uncertainty during supervision, and for accessing a basis on which to validate our models.
How do we assess model fit from a complex neural network architecture, and generalize to any setting, whether it be small data, simulation-based tasks, or even causal inferences? How do we build more expressive models when our tools tell us the generalized linear model fits okay, but is not nearly as fine-grained as we’d like it to be, or let our inferences scale to data that no longer fits in memory? If we use the many innovations of deep learning in concert with statistical analysis, or conversely, the century of statistical foundations in deep learning, we might just achieve something quite grand. (And I mean this one only a little tongue-in-cheek.)
Edward, broadly speaking, tries to combine efforts from both approaches. It’s a software library I’ve always wanted to develop but never had the right resources until now. Fast and distributed computation can be done using TensorFlow as a rich symbolic framework. Neural networks can be easily constructed with high-level libraries like Keras. Flexible probability models can be specified using languages such as Stan. And all of inference can be done using fast approximations via a built-in variational inference engine, with criticism techniques for both point prediction and distribution-based model assessments.
I gave an Edward talk a few days ago at Google Brain— which I quite positively butchered due to a lack of sleep and preparation. (If only I had known I would get a sizeable cast of the TensorFlow core developers, Geoff Hinton, Jeff Dean, and Kevin Murphy in one room!) But I think the work explained itself, and with a lot of excitement about why Bayesian deep learning might be the right thing.
There are a number of necessary developments in Edward to even make small steps in connecting these two approaches. I finally got to refactoring the code for variational auto-encoders; and designing the criticism API for posterior predictive checks; and getting much help from others for explaining how to build neural network-based probabilistic models; and many open problems to try to solve. It’s not even close to there for getting statisticians, machine learners, and deep learners to agree with one another. But I think Edward is making progress.