Using R for parallelizing OpenBUGS on a single Windows PC

It seems that most of the R-parallelizing business takes place on Linux clusters. And it makes sense. Why would you want to paralellize R on just a few processors (2 or 4) of a Windows laptop PC when the whole thing would be only 2-4x faster. This used to be evident from the selection of R libraries - the easy ones (foreach, snow) were for Linux only and I always found it difficult to paralellize R in Windows. One could use something called MPI but the documentation always seemed too complicated for my non-IT brain. And that was when I discovered that the embarrassingly parallel snow package has been adjusted for Windows.

There is a case when running embarrassingly parallel R makes utter sense on a single Windows machine: the parallelizing of MCMC chains in Bayesian modelling framework in OpenBUGS and WinBUGS. One usually runs several MCMC chains of iterations and these are nearly impossible to parallelize because of the within-chain dependence. But the chains themselves can be easily left running next to each other simultaneously as there is no between-chain dependence. Even a 2-4 fold increase of speed is desirable in this case, as it can facilitate debugging of the BUGS code and the overall workflow is more pleasant. The problem was is that, so far, OpenBUGS does not offer the option to paralellize the chains, although the developers state that it is on the way.

Here I present a simple way to parallelize MCMC chains in OpenBUGS using the snow and snowfall packages in R. I got the main idea from a similar blogpost by Awaypku. The trick is to create a separate working directory for each of the cores one wants to use. Then, several independent OpenBUGS sessions are invoked on several CPUs (using the sfLibrary function in snowfall package), their results are stored in the separate directories and, when the MCMC sampling is over, the resulting CODA chains are retrieved back to R. I am aware that this code may become obsolete when the OpenBUGS developers release the parallel version of OpenBUGS (God knows when). But meanwhile, my code may be handy.

I use an example of a simple Bayesian model in which I estimate mean and variance of a normally distributed random variable.

Here is the R code of the whole thing:

# 1. loading the libraries for parallel processing
library(snow)
library(snowfall)

# 2. setting the number of CPUs to be 3
sfInit(parallel=TRUE, cpus=3)

# 3. and assigning the R2OpenBUGS library to each CPU
sfLibrary(R2OpenBUGS)

# 4. generation of artificial data
# which is a normally distributed random variable
# with mean of 12 and SD of 5
x.data <- list(x.data = rnorm(1000, 12, 5), N=1000)

# 5. creating separate directory for each CPU process
folder1 <- paste(getwd(), "/chain1", sep="")
folder2 <- paste(getwd(), "/chain2", sep="")
folder3 <- paste(getwd(), "/chain3", sep="")
dir.create(folder1); dir.create(folder2); dir.create(folder3)

# 6. sinking the model into a file in each directory
for (folder in c(folder1, folder2, folder3))
{
  sink(paste(folder, "/normal.txt", sep=""))
  cat("
    model
    {
      # 6a. priors
      sigma ~ dunif(0, 100)
      mu ~ dnorm(0,0.0001)
      tau <- 1/(sigma*sigma)

      # 6b. likelihood
      for(i in 1:N)
      {
        x.data[i]~dnorm(mu, tau)
      }
    }
  ")
  sink()
}

# 7. defining the function that will run MCMC on each CPU
# Arguments:
# chain - will be 1, 2 or 3
# x.data - the data list
# params - parameters to be monitored
parallel.bugs <- function(chain, x.data, params)
{
  # 7a. defining directory for each CPU
  sub.folder <- paste(getwd(),"/chain", chain, sep="")

  # 7b. specifying the initial MCMC values
  inits <- function()
  {
     list(sigma=runif(0,100), mju=rnorm(0,0.0001))
  }

  # 7c. calling OpenBugs
  # (you may need to change the OpenBUGS.pgm directory)
  bugs(data=x.data, inits=inits, parameters.to.save=params,
             n.iter=2000, n.chains=1, n.burnin=1000, n.thin=1,
             model.file="normal.txt", debug=FALSE, codaPkg=TRUE,
             OpenBUGS.pgm="C:/Program Files (x86)/OpenBUGS/OpenBUGS321/OpenBUGS.exe",
             working.directory=sub.folder)
}

# 8. setting the parameters to be monitored
params <- c("mu", "sigma")

# 9. calling the sfLapply function that will run
# parallel.bugs on each of the 3 CPUs
sfLapply(1:3, fun=parallel.bugs, x.data=x.data, params=params)

# 10. locating position of each CODA chain
chain1 <- paste(folder1, "/CODAchain1.txt", sep="")
chain2 <- paste(folder2, "/CODAchain1.txt", sep="")
chain3 <- paste(folder3, "/CODAchain1.txt", sep="")

# 11. and, finally, getting the results
a <- read.bugs(c(chain1, chain2, chain3))
plot(a)

#

How about the real speed of this? When I run the 3 MCMC chains serially using the ordinary lapply() function, it takes 21.21s to run the 2000 iterations. Using the 3-chains option within OpenBUGS it takes 16.94s. And finally, my parallel code does the whole thing in 8.67s. It will probably show even better speed advantage on larger datasets and longer chains. Not bad.

2 thoughts on “Using R for parallelizing OpenBUGS on a single Windows PC

  1. Thanks for sharing the approach and including code. However I notice that init values aren't actually generated in the inits1.txt for each directory. I'm not exactly sure why...

  2. The thing with the inits probably doesn't work as intended because the 7b inits function is an R function. In R, the first number of the rdist... functions is the number of draws, so it should be 1. Also, in the rnorm, if this is supposed to be a wide distribution, the rnorm in R takes sd as the default third argument. So a better way to specify the initial values (line 57 in the above code) would be:

    list(sigma=runif(1,0,100), mju=rnorm(1,0,sqrt(1/0.0001)))

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>