Yesterday, professor Ronald Smith gave a talk at Yale about how to predict future climate. One of his central subjects was ensemble forecasting. Here I give it a bit of a dissection.
Climatologists and ecologists do "predictive models". Once they have the model, they use it to predict the future, e.g.: How will global temperature rise? Or: Where will species move their geographic distributions? It is considered better not to rely on predictions of a single model, and so more than one model are usually done and their predictions are combined. This practice is called ensemble forecasting or ensemble modelling.
"To do more than one model" can have several meanings: (1) It can mean that one always uses exactly the same model, but every time with different parameter values. This is a form of sensitivity analysis. (2) One can run several formally different models with completely different variables and sets of parameters, but with the same philosophical background (e.g. parametric statistical models). (3) Sometimes people combine formally different forecasting approaches that have completely different (or absent) philosophical background. The third option is common in species distribution modelling (SDM) which often combines statistical models, regression trees, MaxEnt, machine learning approaches and so on.
The principle of sensitivity analysis is to acknowledge uncertainty in parameter estimates of a model when the model is used for forecasting (prediction, forward simulation). Uncertainty about model parameters is given by their probability distributions which are usually estimated during model fitting (or they come as prior knowledge). Probability of particular parameter values used in sensitivity analysis should follow these distributions. The combined or "ensembled" predictions are then interpretable as probability distributions.
Ensembling predictions of parametric models that have different sets of parameters and use different variables is what statisticians call model averaging (Burnham & Anderson 2002). I suggest that predictions of a model in the ensemble should be given weight that is proportional to our belief that the model is right. This is quantified by AIC weights, posterior probabilities and similar. I admit that AIC (or similar measures) are not always possible to calculate exactly. However, I argue that some degree of our relative belief in each model model should always be used, even if it is estimated by rough guess. Or perhaps by cross-validation.
Ensembling predictions of completely different techniques and algorithms with various philosophical backgrounds seems potentially problematic. I don't think many people have any idea what is going on inside the "black-boxy" machines such as random forests, MaxEnt or neural networks. Many of these techniques give no clear answer why do they predict what they predict - they just pop something out. Unfortunately, if there is no answer to that why, if there is no interpretation of the model, then I can see no way to include it to the ensemble of other models because its weight cannot be calculated. How can we express belief in a predictive technique that we are unable to reasonably interpret?
Another issue: If we make an ensemble prediction based on a set of techniques, and we simply make an average of their predictions, or we seek an intersection of their predictions (or something similar), then we implicitly assume that each of the forecasting techniques is equally likely to be correct. But what if they are all wrong? Clearly, doing more than one model is not anything that would guarantee more reliable results. I think that it brings us again back to the quantification of relative support for each model, which should be based on AIC, or something similar. And again, you can only calculate AIC for models that you can explain.