This post is inspired by a question by Dylan Craven that he raised during my Bayesian stats course.
1 Objective
My aim here is to demonstrate that, in Bayesian setting, one can make powerful inference about withingroup ANOVA means even for groups with sample size , i.e. groups with only a single measurement. The main reason this is possible is the assumption of constant variance () among groups: groups with sufficiently large increase credibility of the estimate, which limits the credible interval of in the group with . Conversely, when the assumption is relaxed and is allowed to vary, groups with sample size of 1 are no longer that useful.
2 The data
I will use modified artificiallygenerated data from the example from Marc Kery’s Introduction to WinBUGS for Ecologists, page 119 (Chapter 9  ANOVA). The (artificial) data are supposed to describe snoutvent lengths in five populations of Smooth snake (Coronella austriaca); I modified the data so that the first population is only represented by a single observation, as indicated by the arrow in the figure below.
Loading the data from the web:
snakes < read.csv("http://www.petrkeil.com/wpcontent/uploads/2017/02/snakes_lengths.csv")
Plotting the data:
par(mfrow=c(1,2), mai=c(0.8,0.8, 0.1, 0.1))
plot(snout.vent ~ population, data=snakes, ylab="Length [cm]")
arrows(x1=1.2, y1=58, x0=2, y0=61, col="red", lwd=2, angle=25, length=0.2)
boxplot(snout.vent ~ population, data=snakes, ylab="Length [cm]", xlab="population", col="grey")
3 Fixedeffects ANOVA in JAGS
First, I will model the data using a traditional fixedeffect ANOVA model. For a given snake in population the model can be written as:
Where is a mean of th population (), and identifies individual measurements.
I am interested in a simple question: Is there a statistically significant difference between population 1 and 2? Will I be able to infere the difference even when population 1 has a sample size of 1?
I will fit the model using MCMC in JAGS. Hence, I will prepare the data in a list format:
snake.data < list(y=snakes$snout.vent,
x=snakes$population,
N=nrow(snakes),
N.pop=5)
Loading the library that enables R to communicate with JAGS:
library(R2jags)
JAGS Model definition:
cat("
model
{
# priors
sigma ~ dunif(0,100) # (you may want to use a more proper gamma prior)
tau < 1/(sigma*sigma)
for(j in 1:N.pop)
{
mu[j] ~ dnorm(0, 0.0001)
}
# likelihood
for(i in 1:N)
{
y[i] ~ dnorm(mu[x[i]], tau)
}
# the difference between populations 1 and 2:
delta12 < mu[1]  mu[2]
}
", file="fixed_anova.txt")
And here I fit the model:
model.fit.1 < jags(data = snake.data,
model.file = "fixed_anova.txt",
parameters.to.save = c("mu", "delta12"),
n.chains = 3,
n.iter = 20000,
n.burnin = 10000,
DIC = FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 41
## Unobserved stochastic nodes: 6
## Total graph size: 107
##
## Initializing model
Plotting parameter estimates with mcmcplots
:
library(mcmcplots)
plot(snakes$snout.vent ~ snakes$population, ylim=c(35, 65), col="grey",
xlab="Population", ylab="Body length")
caterplot(model.fit.1, parms="mu", horizontal=FALSE,
reorder=FALSE, add=TRUE, labels=FALSE, cex=2, col="red")
The red dots and bars show posterior densities of . Grey dots are the data.
So what is the difference between population 1 and 2?
denplot(model.fit.1, parms="delta12", style="plain", mar=c(5,5,2,1), main="",
xlab="Difference between means of populations 1 and 2",
ylab="Posterior density")
There is a clear nonzero difference (the posterior density does not overlap zero). So we can conclude that there is a statistically significant difference between populations 1 and 2, and we can conclude this despite having only for population 1.
4 Relaxing the assumption of constant variance
We can have a look what happens when the assumption of constant is relaxed. In other words, now every population has its own , and the model is:
Here is JAGS definition of such model:
cat("
model
{
# priors
for(j in 1:N.pop)
{
mu[j] ~ dnorm(0, 0.0001)T(0,) # Note that I truncate the priors here
sigma[j] ~ dunif(0,100) # again, you may want to use proper gamma prior here
tau[j] < 1/(sigma[j]*sigma[j])
}
# likelihood
for(i in 1:N)
{
y[i] ~ dnorm(mu[x[i]], tau[x[i]])
}
# the difference between populations 1 and 2:
delta12 < mu[1]  mu[2]
}
", file="fixed_anova_relaxed.txt")
Let’s fit the model:
model.fit.2 < jags(data = snake.data,
model.file = "fixed_anova_relaxed.txt",
parameters.to.save = c("mu", "delta12"),
n.chains = 3,
n.iter = 20000,
n.burnin = 10000,
DIC = FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 41
## Unobserved stochastic nodes: 10
## Total graph size: 140
##
## Initializing model
Here are parameter estimates plotted with mcmcplots
library(mcmcplots)
plot(snakes$snout.vent ~ snakes$population, ylim=c(0, 100), col="grey",
xlab="Population", ylab="Body length")
caterplot(model.fit.2, parms="mu", horizontal=FALSE, reorder=FALSE, add=TRUE,
labels=FALSE, cex=2, col="red")
Clearly, the posterior density of is essentially the priror. In other words, the single observation has little power on its own to overdrive the prior.
We can plot the posterior density of the difference
denplot(model.fit.2, parms="delta12", style="plain", mar=c(5,5,2,1), main="",
xlab="Difference between means of populations 1 and 2",
ylab="Posterior density")
And we see that it is all over the place. Huge uncertainty, no clear inference is possible.
5 Conclusion
I have shown that one can make powerful inference about population means even if the sample size for the population is super small (e.g. 1). This is possible when all of the following conditions are met:

Bayesian approach to ANOVA must be used.

We must have sufficient data from other populations.

We must be confident that the assumption of constant variance among the groups is justified.