Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran's I or Geary's *c*) against distance. Although correlograms are not as fundamental as variograms (a keystone concept of geostatistics), they are very useful as an exploratory and descriptive tool. For this purpose they actually provide richer information than variograms.

**Figure 1** | This is how a simple correlogram looks like (as provided by `spdep`

package).

Here is a little overview of the available R packages and functions (I also list some related stuff that could be handy):

`ade4`

- This package has function`gearymoran`

that calculates Moran's I and Geary's c. Does not plot correlograms.`ape`

- Moran's I test (function`Moran.I`

) for spatial and phylogenetic autocorrelation

(based on normal approximation, not on randomizations = fast). Does not plot correlograms.`ncf`

- Provides functions`correlog`

and`spline.correlog`

. Plots correlograms. Does randomization tests.`pgirmess`

- Has function`correlog`

that calculates the correlogram. It uses normal approximation to test significance.`raster`

- Simple function`Moran`

. Works on rasters. You need to specify a simple neighborhood matrix. Does not plot correlograms.`spdep`

-`sp.correlogram`

,`moran`

,`moran.plot`

,`moran.test`

,`moran.mc`

. This is the most comprehensive package, and also the most difficult to work with. Does everything, has steep learning curve.`spatial`

- If I understand it correctly, this package first needs you to fit a trend surface (by kriging) and you can then calculate correlogram of this fitted surface. I haven't gone deeper into it.`mpmcorrelogram`

- I include it as a curiosity. It calculates Multivariate Mantel Correlograms (Oden & Sokal 1986, Legendre & Legendre 2003).

**Comparison of three methods**

Next, I compare three packages that do empirical spatial correlograms using a single function. These are `ncf`

, `spdep `

and `pgirmess`

. First, I generated an artificial and spatially autocorrelated data. I used Principal Coordinates analysis of Neighbourhood Matrix (PCNM) implemented in `vegan`

package to generate the data.

# packages used for the data generation library(raster) library(colorRamps) # for some crispy colors library(vegan) # will be used for PCNM # empty matrix and spatial coordinates of its cells side=30 my.mat <- matrix(NA, nrow=side, ncol=side) x.coord <- rep(1:side, each=side) y.coord <- rep(1:side, times=side) xy <- data.frame(x.coord, y.coord) # all paiwise euclidean distances between the cells xy.dist <- dist(xy) # PCNM axes of the dist. matrix (from 'vegan' package) pcnm.axes <- pcnm(xy.dist)$vectors # using 8th PCNM axis as my atificial z variable z.value <- pcnm.axes[,8]*200 + rnorm(side*side, 0, 1) # plotting the artificial spatial data my.mat[] <- z.value r <- raster(my.mat) plot(r, axes=F, col=matlab.like(20))

**Figure 2** | Artificial spatial pattern used for comparison of the three packages*.*

And here is how the correlograms are done in each of the packages. So far I had the best experiences with `correlog`

in

package. It is user-friendly and simple, you just need x- and y- coordinates of your data and a vector of z- values. You specify the increment of the distance classes, and you can test significance within each distance class by a randomization test. However, note that the randomizations can take some time, and with large data (megapixel rasters) it may be nearly impossible to calculate the correlogram.**ncf **

library(ncf) ncf.cor <- correlog(x.coord, y.coord, z.value, increment=2, resamp=500)

Somewhat similar is function `correlog`

in package

. Again, simple and user-friendly. The advantage is that it does not test the significance by a randomization test, but it uses a normal approximation. Hence, it is faster than **pgirmess**`ncf`

. The package actually relies on some functions from `spdep`

, but it is much simpler than `spdep`

. You just need to specify the x- and y- coordinates, the method and the number of distance classes:

library(pgirmess) pgi.cor <- correlog(coords=xy, z=z.value, method="Moran", nbclass=21)

The third package is

- a real juggernaut. In comparison with the previous two packages, **spdep**`spdep`

offers great flexibility and more options. If you have complex neighborhood (connectivity) structures or some really weird data you may want to go for this. But there is a cost: `spdep`

is not user friendly. If you are not familiar with its classes (e.g. the 'nb' neighborhood class) then you will probably spend hours going through the help (I did). Here is a solution that I came up with for my data:

# 'nb' - neighbourhood of each cell r.nb <- dnearneigh(as.matrix(xy), d1=0.5, d2=1.5) # 'nb' - an alternative way to specify the neighbourhood # r.nb <- cell2nb(nrow=side, ncol=side, type="queen") sp.cor <- sp.correlogram(r.nb, z.value, order=15, method="I", randomisation=FALSE)

Unfortunately, I was unable to get Moran's I for longer lags than 15. It just complained that there is not enough observations at these distances. Well, the previous packages could cope with that. Also, note that I have set randomisations to FALSE. This means that spdep will use the normal approximation to calculate p-values, rather than the randomization test (to save your CPU).

And here is how it looks when I plot all of the correlograms in one plot:

**Figure 3 **| Correlograms of the spatial pattern from Fig. 1 produced by `ncf `

(function `correlog`

), `spdep `

(function `sp.correlogram`

) and `pgirmess `

(function `correlog`

).

They are not completely identical. I guess that this is because the packages use different approaches to define the spatial lags, and also they may have different concepts of spatial connectivity. Personally, I am actually happy that they are similar at all. I was expecting higher magnitude of discrepancy.

There are alternatives to the three packages tested above (see the list above), but they will all require a bit of coding. Packages

, **raster**** ape** and

**offer functions to calculate Moran's I. You just need to find a way how to run these over different spatial lags in order to produce the correlograms. I suspect that especially the**

`ade4`

`Moran`

function in `raster`

will run fast even over large datasets.If you know about other packages, if you spot a crucial mistake here, or if you feel that I have forgotten to mention something really essential, please bombard me with comments; complaints. Thank you!

Hey thanks for the nice overview. Here's another package that you may not know about:

Ecodist:

http://cran.r-project.org/web/packages/ecodist/

Associated paper:

http://www.jstatsoft.org/v22/i07/paper

Cheers Dan, this is helpful! Will certainly make this post more complete.

Thanks for this post Petr Keil...!

But did you happen to consider using Oden's Ipop method for spatial autocorrelation? I was having trouble looking for R packages that offers this method, back when my friend was asking for the R function of this method.

Hi Petr,

Many thanks for your review. It has been very helpful to me for deciding which R package to choose to get Moran I correlograms. Now, "spdep" without doubt. If you were unable to get with this package Moran's I for longer lags than 15, this might be because you used equal distance lags. I prefer to use equal number of pairs for each lag (thus the last lags remain meaninful), and by doing this I have found no problem at all!

Cheers,

Lola FC

Hi Lola, thanks! That is indeed a great and helpful clarification! One idea I have just had on that too: In reality the long-distance lags (and the spatial autocorrelations across these distances) are somehow difficult to interpret anyway. So I guess that it does not matter that much what happens at these distances.

I am new in this field, and exploring ways of making spatial correlograms.

Could you tell me how to use equal number of pairs for each leg?

Johan

Hi Petr,

I have a doubt...(well many, but only one for you...), if the data is a count, let´s say "abundance of species x" and is definitely not normally distributed, I would think that the normal approximation will not be suitable. But the Moran´s I index assesed with randomizations, in not affected at all by the non-normality in the data?

So for this kind of data, the randomization testing of the index would be the correct one?

thank you very much

Hi Petr,

Great overview!

Is there some kind of correlogram to binary data? For instance, using join count analysis. I'm thinking in correlograms of vectors of species presence/absence data.

thank you very much...

cheers

Hi Petr,

regarding the limited distance of your spdep correlogram. Is this not simply because you defined the neighboorhood (r.nb) with an upper bound of d2=1.5? If you increase this to e.g. d2=5, wouldn't you then be able to plot the correlogram for larger distances?

Best

Daniel

Hi Petr

Even though my comment might come a bit late, I think it might help others... At least I was extensively checking this page while trying to calculate a Spatial Correlogram for a larger raster layer.

I found the package SpatialPack to be useful. It allows to calculate a modified t.test, which, as a side effect, calculates a correlogram. I found this one to be the fastest and most stable solution so far, especially if you want to calculate Spatial Correlograms for two layers at the same time. However, there is no significance check. In the example above, the code would be as follows:

library(SpatialPack)

data <- summary(modified.ttest(z.value,z.value,coords = xy,nclass = 21))

plot(x=data$coef[,1],y = data$coef[,4],type = "l")

Many thanks

Beni

Hi Beni, never too late 🙂

Thanks, this is really useful!

Thanks for the great post. I have used ncf package with the argument latlon=TRUE. What would the distance units of the resulting plot be in this case ?

Thank you for the useful comments and a great post. I have used the ncf package for my analysis, however am not sure if there is an adjustment for multiple testing? any idea will be appreciated.

There is any R Packages or functions calcualte correlogram from network distance in stead of xy coordinate?

Hi!

I am struggling with this Autocorrelation tests: is there any suggestion to generate correlograms with specific lag distances (in meters) taken from some coordinates in the field? I have some points in a grid and I would like to plot the Moran index to test for autocorrelation (in a specific z variable, measured at each point). at 50, 150, and 200 meters radius in all points. Any suggestion? I don't know how to do. Thanks.

Hi guys

How can we interpret the results?

Thank your for this nice post.

For the record, i added EcoGenetics to the parcours (instead of spdep) for my real data (30 coordinates, distance range 0.5m to 11m, z is species abundance), and forced it to 15 distance bins (for ncf, i had to give a increment leading to 15 bins). The results were roughly comparable, if not for slightly different distance bin means. I find it curious that pgirmess doubled the amount of comparisons (as if a distance matrix wasnt split by diagonale):

um(unlist(df.eco$Z.size))

[1] 435

> sum(unlist(df.ncf$n))

[1] 435

> sum(unlist(df.pgir$n))

[1] 870

Anyway, here is the plot. Do you have an idea (if you even look at this post anymore^^), to force the three packages to use the exact same distances?

https://s9.postimg.cc/k49xid03z/Package_comparison.png

Hello,

thank you for the post. I would like to ask you about the interpretation of the plots. One would assume no spatial correlation if you get a horizontal line at 0 (y-axis). However, if you have a slowly declining trend towards the zero that doesn't go above the y = 0.2, what can you conclude? The correlation is always lower than 0.2, which is a low correlation, but there is this declining trend towards zero as distance increases. Any suggestion?

Cheers!

Diego