Category Archives: R
Bayesian ANOVA: Powerful inference with within-group sample size of 1
1 Objective 2 The data 3 Fixed-effects ANOVA in JAGS 4 Relaxing the assumption of constant variance 5 Conclusion This post is inspired by a question by Dylan Craven that he raised during my Bayesian stats course. 1 Objective My aim here is to demonstrate that, in Bayesian setting, one can make powerful inference about… Read More »
Reproducible art with R
This is my tribute to the fantastic R package spatstat. All the artwork was 100% done in R, the source code is here. Click the images for hi-res (6000 x 4000) versions. License: This is a public domain work. Feel free to do absolutely whatever you want with the code or the images, there are… Read More »
Logarithmic axes with linear gridlines in basic R plots
I like Mathematica’s and Matlab’s log-log plots with logarithmic axes and linear tickmarks (and gridlines). In a way, they enable to imagine both multiplication and addition in a single figure. They also enable to more exactly visually connect data points with values. I haven’t found a simple ‘one-liner’ that’d do such plots in R. In… Read More »
Survival analysis: basic terms, the exponential model, censoring, examples in R and JAGS
I have put together some basic material on survival analysis. It is available as: .html document with highlighted syntax here. Printer-ready .pdf document here. GitHub repository with all the source files here. Main motivation was that I wanted to learn the basics myself; also, it's tricky to find simple examples of survival models fitted in… Read More »
Simple template for scientific manuscripts in R markdown
I've made a really simple template for the classical manuscript format for R markdown and knitr. Here are the resulting .pdf and .html. The template contains the four usual components of any scientific manuscript: equations (using LaTeX syntax) table with caption (done by kable package, but you can also use xtable) figure with caption citations… Read More »
GAM splines now easy in JAGS and OpenBUGS. An example on 2D spatial data
Last week I met Simon Wood, creator of mgcv package, which is THE tool for fitting Generalized Additive Models (GAM) in R. Simon brought my attention to function jagam which he has just added to mgcv. The function allows to transform the ‘spline’ or ‘smooth’ component of GAM model formula into BUGS code, meaning that… Read More »
12 nifty tips for scientists who use computers
Simple things are good. Here is a list of 12 things that I find simple and useful, yet not many of my colleagues use them. The list is R-biased. Knitr. Intuitive tool to integrate R and text to make reports with fancy fonts, figures, syntax-highlighted R code and equations. If you use R studio, then… Read More »
Bayesian PCA
Authors: Jan Smycka, Petr Keil This post introduces experimental R package bPCA which we developed with Jan Smycka, who actually came with the idea. We do not guarantee the very idea to be correct and there certainly are bugs – we invite anyone to show us wrong, or to contribute. Rationale of bPCA Here is… Read More »
Tailoring univariate probability distributions
This post shows how to build a custom univariate distribution in R from scratch, so that you end up with the essential functions: a probability density function, cumulative distribution function, quantile function and random number generator. In the beginning all you need is an equation of the probability density function, from which everyting else can… Read More »
A suggestion to Windows-based users of R: It may be time to relocate
Do you remember the time when you switched from graphical statistical software to R? I did it eight years ago, and I had hard time doing even a simple regression analysis without constantly searching for help, it was a pain. In desperation I frequently cheated and went back to Statistica for the familiar window-ish feeling.… Read More »
Spatial autocorrelation of errors in JAGS
In the core of kriging, Generalized-Least Squares (GLS) and geostatistics lies the multivariate normal (MVN) distribution – a generalization of normal distribution to two or more dimensions, with the option of having non-independent variances (i.e. autocorrelation). In this post I will show: (i) how to use exponential decay and the multivariate normal distribution to simulate… Read More »
Poisson regression fitted by glm(), maximum likelihood, and MCMC
The goal of this post is to demonstrate how a simple statistical model (Poisson log-linear regression) can be fitted using three different approaches. I want to demonstrate that both frequentists and Bayesians use the same models, and that it is the fitting procedure and the inference that differs. This is also for those who understand… Read More »
The joy and martyrdom of trying to be a Bayesian
Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use generalized linear mixed effect models.… Read More »
Spatial correlograms in R: a mini overview
Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran's I or Geary's c) against distance. Although correlograms are not as fundamental… Read More »
Beware: 2 is not always 2 in R
This post is minimalistic. Consider this: Now let's have look at what's inside x: But is it really true? Here you go. A colleague of mine was once ruined by this for an entire day before we realized what was going on. So, how to find out what REALLY is the value of x? Try:
AIC & BIC vs. Crossvalidation
Model selection is a process of seeking the model in a set of candidate models that gives the best balance between model fit and complexity (Burnham & Anderson 2002). I have always used AIC for that. But you can also do that by crossvalidation. Specifically, Stone (1977) showed that the AIC and leave-one out crossvalidation… Read More »
Gridding data for multi-scale macroecological analyses
These are materials for the first practical lesson of the Spatial Scale in Ecology course. All of the data and codes are available here. The class covered a 1.5h session. R code for the session is also at the end of this post. The following advices and ideas are mostly my personal opinions, they were… Read More »
Not all proportion data are binomial outcomes
It really is trivial. Not every proportion is frequency. There are things that have values bounded between 0 and 1 and yet they are neither probabilities, nor frequencies. Why do I even bother to write this? Because some kinds of proportions should be treated as unbounded continuous variables, and should be analyzed using appropriate statistical… Read More »
Predictors, responses and residuals: What really needs to be normally distributed?
Introduction Many scientists are concerned about normality or non-normality of variables in statistical analyses. The following and similar sentiments are often expressed, published or taught: "If you want to do statistics, then everything needs to be normally distributed." "We normalized our data in order to meet the assumption of normality." "We log-transformed our data as… Read More »
Data-driven science is a failure of imagination
Professor Hans Rosling certainly is a remarkable figure. I recommend watching his performances. Especially the BBC's "Joy of Stats" is exemplary. Rosling sells passion for data, visual clarity and great deal of comedy. He represents the data-driven paradigm in science. What is it? And is it as exciting and promising as the documentary suggests? Data-driven scientists… Read More »
Fast Conway's game of life in R
Here I demonstrate a simple way to code Conway's game of life (GoL) in R and to produce the animation above. Cellular automata in R are usually painfully slow if you iterate through all grid cells in an array. A couple of years ago my friend Martin Weiser came with an idea to avoid the… Read More »
The simplest Species Distribution Model in OpenBUGS & R
This post demonstrates the simplest Species Distribution Model based on logistic regression for presence/absence data. I heavily simplified the example from Kéry (2010): Introduction to WinBUGS for Ecologists, Chapter 20.
Using R for parallelizing OpenBUGS on a single Windows PC
It seems that most of the R-parallelizing business takes place on Linux clusters. And it makes sense. Why would you want to paralellize R on just a few processors (2 or 4) of a Windows laptop PC when the whole thing would be only 2-4x faster. This used to be evident from the selection of… Read More »
Linear regression in OpenBUGS
I always wondered why is it so difficult to find an OpenBUGS example of simple linear regression on the Web. Curiously, such example is even missing in the OpenBUGS help. The only nice example so far is in the book by Marc Kéry. I have simplified the code. You need to have OpenBUGS (or WinBUGS)… Read More »