This post is inspired by a question by Dylan Craven that he raised during my Bayesian stats course.
1 Objective
My aim here is to demonstrate that, in Bayesian setting, one can make powerful inference about withingroup ANOVA means even for groups with sample size , i.e. groups with only a single measurement. The main reason this is possible is the assumption of constant variance () among groups: groups with sufficiently large increase credibility of the estimate, which limits the credible interval of in the group with . Conversely, when the assumption is relaxed and is allowed to vary, groups with sample size of 1 are no longer that useful.
2 The data
I will use modified artificiallygenerated data from the example from Marc Kery’s Introduction to WinBUGS for Ecologists, page 119 (Chapter 9  ANOVA). The (artificial) data are supposed to describe snoutvent lengths in five populations of Smooth snake (Coronella austriaca); I modified the data so that the first population is only represented by a single observation, as indicated by the arrow in the figure below.
Loading the data from the web:
snakes < read.csv("http://www.petrkeil.com/wpcontent/uploads/2017/02/snakes_lengths.csv")
Plotting the data:
par(mfrow=c(1,2), mai=c(0.8,0.8, 0.1, 0.1))
plot(snout.vent ~ population, data=snakes, ylab="Length [cm]")
arrows(x1=1.2, y1=58, x0=2, y0=61, col="red", lwd=2, angle=25, length=0.2)
boxplot(snout.vent ~ population, data=snakes, ylab="Length [cm]", xlab="population", col="grey")
3 Fixedeffects ANOVA in JAGS
First, I will model the data using a traditional fixedeffect ANOVA model. For a given snake in population the model can be written as:
Where is a mean of th population (), and identifies individual measurements.
I am interested in a simple question: Is there a statistically significant difference between population 1 and 2? Will I be able to infere the difference even when population 1 has a sample size of 1?
I will fit the model using MCMC in JAGS. Hence, I will prepare the data in a list format:
snake.data < list(y=snakes$snout.vent,
x=snakes$population,
N=nrow(snakes),
N.pop=5)
Loading the library that enables R to communicate with JAGS:
library(R2jags)
JAGS Model definition:
cat("
model
{
# priors
sigma ~ dunif(0,100) # (you may want to use a more proper gamma prior)
tau < 1/(sigma*sigma)
for(j in 1:N.pop)
{
mu[j] ~ dnorm(0, 0.0001)
}
# likelihood
for(i in 1:N)
{
y[i] ~ dnorm(mu[x[i]], tau)
}
# the difference between populations 1 and 2:
delta12 < mu[1]  mu[2]
}
", file="fixed_anova.txt")
And here I fit the model:
model.fit.1 < jags(data = snake.data,
model.file = "fixed_anova.txt",
parameters.to.save = c("mu", "delta12"),
n.chains = 3,
n.iter = 20000,
n.burnin = 10000,
DIC = FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 41
## Unobserved stochastic nodes: 6
## Total graph size: 107
##
## Initializing model
Plotting parameter estimates with mcmcplots
:
library(mcmcplots)
plot(snakes$snout.vent ~ snakes$population, ylim=c(35, 65), col="grey",
xlab="Population", ylab="Body length")
caterplot(model.fit.1, parms="mu", horizontal=FALSE,
reorder=FALSE, add=TRUE, labels=FALSE, cex=2, col="red")
The red dots and bars show posterior densities of . Grey dots are the data.
So what is the difference between population 1 and 2?
denplot(model.fit.1, parms="delta12", style="plain", mar=c(5,5,2,1), main="",
xlab="Difference between means of populations 1 and 2",
ylab="Posterior density")
There is a clear nonzero difference (the posterior density does not overlap zero). So we can conclude that there is a statistically significant difference between populations 1 and 2, and we can conclude this despite having only for population 1.
4 Relaxing the assumption of constant variance
We can have a look what happens when the assumption of constant is relaxed. In other words, now every population has its own , and the model is:
Here is JAGS definition of such model:
cat("
model
{
# priors
for(j in 1:N.pop)
{
mu[j] ~ dnorm(0, 0.0001)T(0,) # Note that I truncate the priors here
sigma[j] ~ dunif(0,100) # again, you may want to use proper gamma prior here
tau[j] < 1/(sigma[j]*sigma[j])
}
# likelihood
for(i in 1:N)
{
y[i] ~ dnorm(mu[x[i]], tau[x[i]])
}
# the difference between populations 1 and 2:
delta12 < mu[1]  mu[2]
}
", file="fixed_anova_relaxed.txt")
Let’s fit the model:
model.fit.2 < jags(data = snake.data,
model.file = "fixed_anova_relaxed.txt",
parameters.to.save = c("mu", "delta12"),
n.chains = 3,
n.iter = 20000,
n.burnin = 10000,
DIC = FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 41
## Unobserved stochastic nodes: 10
## Total graph size: 140
##
## Initializing model
Here are parameter estimates plotted with mcmcplots
library(mcmcplots)
plot(snakes$snout.vent ~ snakes$population, ylim=c(0, 100), col="grey",
xlab="Population", ylab="Body length")
caterplot(model.fit.2, parms="mu", horizontal=FALSE, reorder=FALSE, add=TRUE,
labels=FALSE, cex=2, col="red")
Clearly, the posterior density of is essentially the priror. In other words, the single observation has little power on its own to overdrive the prior.
We can plot the posterior density of the difference
denplot(model.fit.2, parms="delta12", style="plain", mar=c(5,5,2,1), main="",
xlab="Difference between means of populations 1 and 2",
ylab="Posterior density")
And we see that it is all over the place. Huge uncertainty, no clear inference is possible.
5 Conclusion
I have shown that one can make powerful inference about population means even if the sample size for the population is super small (e.g. 1). This is possible when all of the following conditions are met:

Bayesian approach to ANOVA must be used.

We must have sufficient data from other populations.

We must be confident that the assumption of constant variance among the groups is justified.
Hi Petr,
Perhaps you can help me as a Bayesian novice: how reliable is Bayesian inference for very large datasets?
For traditional anova, groups with large sample sizes tend to always be significantly different even when these differences are very small (because the degrees of freedom are so high). In these instances, inference is not very helpful and we have to fall back on subjective comparisons of whether an effect size is practically, as opposed to statistically, different. Does Bayesian anova tend to be like this?
In other words, how sensitive is the posterior distribution of the difference between means to sample size? Does it also become more narrowly distributed when sample sizes are very large?
Hi Falko,
To clarify that upfront: effect size and statistical significance (whatever you understand by that) are different things, and should ideally both be reported  I don't think anyone sane reports just the significance, or just the difference.
Concerning large sample sizes in general: Neither in frequentist nor in Bayesian approach there is a problem with having small but statistically significant difference between groups (detectable thanks to your large sample size)  it is just not very useful (or interesting), as you mention. Actually, I don't really see any disadvantages of having large datasets, not in Bayesian nor in frequentist approaches either (unless there is a systematic measurement bias that amplifies with more data).
Finally: Yes, posterior distributions of the means will get narrower with increasing sample size. I'd argue that this is a good thing, the more credibility the better. In this respect, it is good to be aware that having narrow posteriors of the means is a completely different thing from the width of your prediction intervals (given sample size and by the variance parameter sigma^2). In other words, you can have highly significant (but small) difference between groups, but when you simulate (predict) from you model, the predictions are still widely spread and all over the place.
Petr