The joy and martyrdom of trying to be a Bayesian

Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use generalized linear mixed effect models.

A complete workflow from the initial data massage to model fitting and the output of the results can easily fit one page of an R script. Even if more is needed, it rarely takes more than a second to run.

Now here is a warning: Do not go beyond that if you plan to be happy. Stay with lm() and glm(), and use mixed-effect models only when the referees bully you too much. If the referees of your paper start to use buzzwords such as 'hierarchical', 'prior' or 'uncertainty', withdraw and promptly send the manuscript to another journal. You will publish faster and more, and you will get cited more because more people will actually understand your results.

I was a fool to go beyond that. Initially, I just wanted to understand what mixed effect models REALLY are. Maybe you know the confusion: What the hell is the difference between random and fixed effects? How on earth am I supposed to ever confidently specify a correct mixed effect model structure using the bloody formula representation in lme4 and nlme libraries in R? Why is it so difficult to get expert advice on that? Why are all the relevant textbooks written for statisticians and not for humans? How should I interpret the results?

I ended up crunching through book after book. And I found myself studying probability distributions, figuring out what exactly their role is in statistical models. I had understood the difference between data and models, I had finally got a grip on likelihood, and I realized that there are at least two definitions of probability itself.

At a certain moment I had a satori. There was a little click and it all unfolded backwards. And I started to see things with new eyes. It was like being on a trip. I saw t-test, I saw ANOVA, I saw linear regression, Poisson log-linear regression, logistic regression, and finally I saw what some call mixed-effect models. But I saw them as arbitrary and modifiable categories, building blocks, templates. They were all just probability distributions connected by algebra.

The world of my ideas started to beg for an unrestricted intercourse with the world of the data on my screen. I felt like I was liberated and able to translate any hypothesis into an exact formal representation, and that these representations can be properly parameterized and fit to the data because there has been MCMC.

Once I was able to see probability as the intensity of belief I no longer saw any controversy in the usage of informative priors. So straightforward! P-values, test statistics, null hypotheses, randomization tests, standard errors and bootstrap ended up in garbage - not for being incorrect, they just seemed too ad hoc. Machine learning, neural networks, regression trees, random forests, copulas and data mining seemed like primitive (and dangerous) black boxes used by those who had not seen the light yet.

It was the time of the joy of boundless possibilities.

This was all several manuscripts, four conferences, heaps of papers and thousands of lines of code ago. Almost two years of reality ago.

Recently, I have realized that:
I spend days massaging the data so that they fit the OpenBUGS or JAGS requirements. I spend weeks to months thinking about how exactly I should implement my models. I spend days to weeks trying to make OpenBUGS or JAGS run without crashes - JAGS at least gives me a hint what went wrong, OpenBUGS is almost impossible to debug. It costs me more effort to explain my models and results to people. In manuscripts, where I used "I fitted a generalized linear model (poisson family, log link) to the data using glm() in R", I now have a page of description of model structure, an extra paragraph describing how my chains converged, and I plague my manuscripts with equations. Even if the model is not that complex, it puts readers off, including my co-authors. Referees who never used latent variables and hierarchical models have a hard time seeing through it. I have to spend a lot of energy and time explaining my methods in responses to referees. Even a simple re-run or correction of my analyses can take days or weeks. As a consequence, the publishing process is slower, dissemination of my results is less effective, and I expect to be less cited. Oh, and the usage of informative priors seems suspicious to almost everybody.

But don't get me wrong. I still love it! The joy of seeing my little posterior distributions popping out is enormous. I still think that it is all worth it: it is the price that I pay for having all exactly defined (hopefully) and transparent (hopefully), and with the uncertainty properly quantified (hopefully). And since I have always been an idealist, it comforts me that I have at least ideological superiority. Pragmatically and practically speaking, it has been a martyrdom.

I guess that all emerging technologies and paradigms are like that. The starts are clumsy and bumpy. When computers appeared they were one big pain to work with: you had to translate the code into holes in a punched card, you mailed (not e-mailed!) it to a computing center, and in several weeks you would receive an envelope with a punched card that represented the results (perhaps a least-squares solution of a simple linear regression). Imagine debugging that! And you know what computers are today. With Bayesian hierarchical modelling it seems similar. STAN is a promising next step. I believe that the martyrdom is only temporary, and that it will pay off in the long run.

19 thoughts on “The joy and martyrdom of trying to be a Bayesian

  1. I am having a very similar experience to yours. I've submitted two papers so far with fully bayesian analyses, but only got mild blowback so far. The reactions to seeing a bayesian model range from, "why are you fitting a bayesian model? Just because you can." (i.e., I have to justify why I am not doing the standard thing), to "well, the bayesian model tells you the same thing as the frequentist one" (what they mean is that the conclusion is the same).

  2. Thanks for the fair warning... I found your post very funny and wise. I'm about to start down that road. Kruschke's Bayesian book with the cute puppies on the cover is on my desk waiting expectantly be READ and USED, instead of just being skimmed and admired. Even this friendly treatment looks pretty scary, but I have to go that way, because, like you, I know that it is the right thing to do.... Sigh.

    • Even i am using that very same book and got intrigued by Byesian statistics and decided to chase this debate further on the internet. I had always these questions in my head, but now it seems like i got the answer in Bayesian Statistics :)

  3. Thanks for the well-written story of your stats-journey! Even a novice such as me (who is somewhere between lm() and glm() ) could appreciate it.

    Something in the back of my mind thinks that it's only a matter of time before the frequentist approach is passe. And oodles of effort spent learning the intricate details of it will be an ultimate waste of time. (I'm also suspicious of this opinion from myself, because it's very possible I'm just being lazy.)

    My strategy may be to keep my frequentist analyses as simple as possible until explaining the bayesian approach has evolved enough that even I can understand it.

  4. Thank you for your post. It was like finally see many random thoughts once appeard written down in a nice and comphresible way. I share most of your opinions and i felt the same "enlightment" you talk about some time ago..after 5 hard years of statistics study. Bayesian is ideologically powerful for the simple reason it gives you much more flexibility to "formally explain" phenomena you r interested in. Studying, implementing and communicating it is a pain. Probably this has been the same for classical methods too, before they were nicely implemented in statistical packages, equipped with ad-hoc hypothesis tests and reference books explaining the theory behind them. You are probably right when you say bayesian methods will meet common consent and they will become easier to work with, sooner or later

  5. Hello, I am a computer science student and have some idea of machine learning and such, but somewhat unrelated to statistics... Is there something like an R tutorial for working with probability distributions? Maybe presenting common scenarios and how you would model and test and what else? Cheers!

  6. Pingback: Somewhere else, part 74 | Freakonometrics

  7. I guess that I am in the beginning of your journey. I am experimenting on mixed effects but I have not really found the rabbit hole to the world of bayesian enlightenment.

    Do you have the map? I am still in search for the publication that will show me the benefits of bayesian statistics. I would really like to experience that satori.

  8. As you were crunching through book after book studying probability distributions and their role in statistical models, did you find a recommendable book concerning these topics? I also want to learn more about the mathematics behind the models but my problem is similar to yours. Either books are too simple (not explaining how things work) or they are written for statisticians. How did you manage it finally? Thanks for your help

  9. I hear and feel your pain! I also find myself spending far to much time in JAGS and R, though isn't it nice to understand what is actually going on rather than blindly using lm (as I used to do)...

    One question, you list stuff that are "ad hoc" and among them you list copulas. I reasonly learned about copulas and my first reaction was that they nocely would fit into a Bayesian framework. I would be very interested in what makes them ad hoc :)

  10. Thank you so much, Petr, for this post. Yes, it is an uphill slog. And, yes, there are pitfalls and quirks and warts having to do with using this new technology.

    On the other hand, I like to think that the extra work, the extra care, the extra agony compared to a frequentist approach are simply the kinds of internal struggle that's really necessary to get at what's going on with the truth about something. The most damning thing about a frequentist approach, I find, is that it's easy to seem reasonable and analytical and derive a result, and yet fool yourself into thinking that there's really something to it. Sure, it's possible to fool yourself with Bayesian methods, but it is, I think much harder.

    That is in part because Bayesians are approaching these problems knowing that they don't know, and trying to quantify how much they don't know.

    It can be frustrating because some people expect these answers to be cranked out, and, if you are Bayesian, you'll get the answer they want, but you aren't true to yourself until you also crank out your uncertainty about the answer.

    Not all applications are that way, of course. In particular, there are hard applications, such as Internet traffic modeling, where there are few given principles and laws. Nevertheless, people want to be able to make certain decisions. In such a world, the frequentist often flails and tries this thing and that, but never can say how widely applicable their findings are. The Bayesian can help decide. The user might not be happy about how poorly constrained the basis for the decision is, or what their downside risks are, for the Bayesian can often tell them that. But, given the information at hand, the Bayesian can offer a decision. That's powerful.

  11. I love it! Great post. I am currently at the "It was the time of the joy of boundless possibilities." stage and regularly describing a day's work as "mind-blowing..."

    I worry about my future a little, but I am optimistic that we (meaning scientists using applied Bayesian stats) will learn to express ourselves more clearly and confidently. Is the detail always necessary for most of our intended audience? Perhaps simply have it ready for those interested in the finer details. An applied science article using classical analysis does not usually report the details of model diagnostics, for example.

  12. This is a great post and very much mirrors my experiences/thoughts. I am in the interesting position of being a professional consultant who performs Bayesian analyses using JAGS and R on a diversity of datasets and has to produce analytic results on fixed and sometimes small budgets with hard deadlines. As a result I have run alot of analyses and learnt a great deal about work flow and efficiency and presentation to non-Bayesian and non-statistical clients - actually its much easier when they are non-statistical ;) As you identify the next step is software because while packages such as rjags are functional they are relatively primitive. Your blog might inspire me to put my thoughts about the future onto paper (of the online type)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>