For many years, meteorological models have been run with perturbated initial
conditions or parameters to produce ensemble forecasts that are used as a
proxy of the uncertainty of the forecasts. However, the ensembles are usually
both biased (the mean is systematically too high or too low, compared with
the observed weather), and has dispersion errors (the ensemble variance
indicates a too low or too high confidence in the forecast, compared with the
observed weather). The ensembles are therefore commonly post-processed to
correct for these shortcomings. Here we look at one of these techniques,
referred to as Ensemble Model Output Statistics (EMOS) (Gneiting et al.,
2005). Originally, the post-processing parameters were identified as a fixed
set of parameters for a region. The application of our work is the European
Flood Awareness System (

Ensemble modelling has a long history in meteorology, and is also increasingly used in hydrology, mainly using the meteorological ensembles as forcing. By perturbing the initial conditions or parameters of the model, an ensemble of forecasts is produced, assuming that this is a proxy of the uncertainty of the forecast. However, even if the perturbations are sampled from a probability distribution of the conditions or parameters, it is frequent that the resulting ensembles are both biased (the mean is systematically too low or too high) and wrongly dispersed (the ensemble variance indicates a too low or too high confidence in the forecast, compared with the observations afterwards).

It is therefore common to post-process the forecasts. Two commonly methods are frequently used in meteorology: Bayesian Model Averaging (Raftery et al., 2005), which mainly focuses on calibration, and optimization based on the Ensemble Model Output Statistics (Gneiting et al., 2005), referred to as EMOS. Mostly the EMOS-method is calibrated with the use of Continuous Ranked Probability Score (CRPS), which is an indicator which punishes both biases and dispersion errors.

We will here mainly focus on the EMOS-method. In the original contributions in meteorology, it was common to fit a regional set of parameters for the post-processing. From the post-processed distributions for each location, samples were drawn to generate a post-processed ensemble. These were spatially independent, but Berrocal et al. (2007) extended the methodology to use the spatial structure of the errors to generate a spatially structured covariance matrix which can be used to generate spatially consistent samples, based on the Geostatistic output perturbation technique (Gel et al., 2004). Their method is still using the same set of weights for all locations.

The application of these types of post-processing techniques in hydrology started later. Hemri et al. (2013) developed a method for postprocessing runoff forecasts for individual stations, using the methods of Berrocal et al. (2007) for incorporation of the correlation between lead times. This correlation is likely to be higher for runoff than for meteorological variables. Engeland and Steinsland (2014) presented a method which would fit different weights to different locations and lead times, but still assuming the same number of forecasts for all lead times.

Our application of ensemble forecasting is the European Flood Awareness
System (EFAS,

The previous applications in hydrology did not consider forecasts outside the calibration points, similar to what Berrocal et al. (2007) did for meteorological applications. In this paper we will present a methods which will make it easier to make predictions outside calibration points, and also for making simulations of the possible discharge.

The analyses in this paper are based on a combination of meteorological
forecasts and ensemble forecasts from ECMWF, DWD, COSMO-LEPS and UK Met
Office. We use forecasts from a period of almost two years
(8 January 2012–31 December 2013). For each day, the forecasts have up to 10
days lead time.

ECMWF: The European Centre for Medium Range Weather produces forecasts for the next 10 days. The forecast from ECMWF is an ensemble with 51 members, in addition to a deterministic forecast

DWD: The German Weather Service produces a deterministic forecast for the next 7 days.

COSMO-Leps: The Cosmo consortium produces an ensemble forecast with 16 members.

UK-MET: The UK Met office produces an ensemble of 24 members.

As observed values, we are using simulated runoff at 701 stations. The runoff has been simulated from interpolated observed values, using the same model setup of LISFLOOD as for the forecasts. There are some additional stations in the original data set, but these were discarded from the analyses as the runoff appeared to be unreasonably high compared to the estimated basin size, or that some of the forecasts were not available for all lead times and models.

We are using simulated values instead of real observations for comparison, as these will have the same model errors as the forecasts, such as boundary errors and routing errors.

The simulated and forecasted runoff data is divided by catchment area in
1000 km

The post-processing method we are applying in this paper is based on the
Ensemble Model Output Statistics method (Gneiting et al., 2005). Shortly
described, the idea is that the mean and variance of a range of forecasts
might be biased and wrongly dispersed, so we want to find a weighted mean of
the ensemble, whereas the variance can be assumed to fit a regression
equation. As we have a combination of deterministic and ensemble forecasts,
we will use the deterministic forecasts and the mean of each ensemble
forecast for the bias correction, i.e., for a particular station

Our region of interest is Europe. We do therefore not expect one set of
weights to be sufficient for the whole modelling domain. However, we have the
ensembles for all grid cells along a river, and would like to be able to make post-processed forecasts also for other locations than the calibration locations. The
solution is to interpolate the weights along the river network, to have
unbiased predictions for each pixel where a prediction is wanted. For this we
will use top-kriging (Skøien et al., 2006, 2014). Top kriging is a
geostatistical method for interpolation between areas of different spatial
support, such as observations along a river network. The method is well
explained in the citations above and will only be summarized here for river
related applications as follows:

A sample variogram is estimated from the observations for each gauge, as a spatial average if the variable is a spatial aggregate such as runoff and most runoff statistics. The centre of the upstream contributing area is used to compute the distances. Variograms are binned according to the size of each of the catchments, not only distance.

A variogram model is found by jointly fitting regularized variogram values to the binned sample variogram values.

A covariance matrix of expected semivariances between observation catchments and between observation catchments and prediction catchments is found from the variogram model, based on the size and location of the catchments.

Interpolation and cross-validation is performed as in normal kriging, based on the covariance matrices.

Left and central panel: Development of CRPS-errors and spatial
errors from first to last iteration for lead time of one day. Right:
Comparison of CRPS and spatial error after last iteration. Solid line
represents

However, the fitting method in Eqs. (4) and (5) can give poorly correlated
weights for neighbouring locations if two or more of the forecasts are highly
correlated. For example, if we only had two forecasts and they were equal for
a certain location, any combination of weights giving the same sum would give
the same error. To force a certain correlation between weights, and variance
coefficients between locations (all referred to as parameters below), we use
an iterative procedure where we introduce a spatial penalty as a function of
the modelled semivariance between two locations and the difference of all the
m parameters:

The calibration is done station by station. In the first iteration, no
spatial penalty is added, as many neighbouring stations have not been
computed yet. In the second iteration,

Cross-validation

A common usage of post-processing in meteorology is to create simulations of the variable of interest. This can also be done with the post-processing we are presenting here, based on the calibrated parameters and the semivariogram above. The simulation method is based on the Sequential Gaussian Simulation method (Deutsch and Journel, 1998), combined with Kriging with uncertain data (KUD) (de Marsily, 1986; Merz and Blöschl, 2005).

We start with the weighted mean and uncertainties for each calibration location.

In a random order, we visit all calibration locations and prediction
locations, and do the following step for each of them:

For a new location, we predict the mean and the kriging variance, using the weighted mean for the calibration locations, and previously simulated locations as observations. For the KUD prediction, we use the weighted ensemble variance for the calibration locations.

Sample a value from the predictive distribution (traditionally assumed to be Gaussian) with the prediction as mean and the kriging variance as variance. Add this to the set of observed/simulated values. This simulated value will in the subsequent simulations have an uncertainty of zero in the KUD prediction.

Replace the weighted mean with a simulated value if the simulation concurs with a calibration location.

A second issue is that runoff values are typically above zero. Using random
sampling from a Gaussian distribution can give negative observations. We are
therefore instead assuming a long-normal distribution in this case,
log-transforming the predictive mean and variance with

The initial fitting of the parameters are done without the spatial penalty.
We can therefore easily see how much the use of the penalty increases the
CRPS-error and how we reduce the spatial errors as in Fig. 1. The
CRPS-error increases marginally for all catchments, but the largest increase
is less than 25 and 75 % of the catchments increase less than 5 %.
The spatial penalty reduces considerably with the iterations, 55 % of the
catchments see the spatial error reduced to less than

Predicted (dots) and simulated specific runoff (pixels) for one day lead time for a region on both sides of the German/Polish border (red line).

Predicted (dots) and simulated specific runoff (pixels) for 10 days lead time for a region on both sides of the German/Polish border (red line).

We can also notice that there is quite a large range in the errors, both CRPS and spatial error in approximate range from 1–1000. We have not analyzed the reason for this, although it is likely not related to area. First, the runoff has been divided by catchment area, second, we have plotted (not shown) both errors and ratios between errors against catchment area, without finding any strong relationships.

Table 1 gives an overview of

With the fitted parameters, we have an estimate of the predictive mean and uncertainty for each of the calibration locations. From these, we can simulate the specific runoff for each pixel along the river network. Figures (2) and (3) show the results from 4 simulations for the first and the 10th forecast day, respectively, based on forecasts from 17 February for a region on the German-Polish border. The forecast indicates a flood event to the end of this period (10th forecast day), so the predictions and the simulations are considerably higher for Fig. 3 than for Fig. 2. The dots show the predictive mean for the calibration locations based on the fitted EMOS-parameters, whereas the pixels show the simulated values based on the variogram and the predictive uncertainty from the calibration locations. We can see that the simulations are relatively close to the predicted mean for locations close to the calibration locations, whereas the deviations between simulations can be considerably larger in the smaller tributaries far from the calibration locations.

We have used the EMOS-method for post-processing of runoff predictions from an ensemble forecasting method. The results indicate that it is well possible to use top-kriging for interpolating the EMOS-parameters along the river network as long as the parameters have been fitted with a method which forces some degree of spatial continuity between the parameters.

We have also shown that it is possible to use top-kriging for simulation of runoff at uncalibrated locations, using the variogram and post-processed predictive distributions at the calibration locations. Using these simulations is a different approach than interpolating the EMOS-parameters to create uncorrelated predictive distributions for each locations along the river network. Such simulations have, as far as we know, not yet been used in hydrological forecasting, and the possible usages still need further analyses. One important aspect is that the uncertainty of the smaller tributaries will not only be based on meteorological uncertainty, as for ensemble modelling with a hydrological model with a single parameter setup, it will also include the modelling uncertainty.

We notice that there are still a few issues which have to be further improved in the analyses presented here. First of all, much of the theory is developed for variables with a normal distribution. However, runoff usually does not follow a normal distribution. We will in the near future analyse the possibilities for using transformations to be able to work with more normalized variables. We did use a lognormal transform for the simulations. However, the way it was done is not well founded in geostatistical theory, and will need further improvements. Some of the simulated values in the tributaries are extremely large, which can well describe the statistical uncertainty, but maybe not so much the meteorological uncertainty. Further comparisons of the simulations here and the results of each pixel from a distributed ensemble model will be necessary. Possible limitations when it comes to the use of models and scale have not yet been analysed, however, we do not see any reasons why the methodology could not be applied also for other ensemble models and hydrological models than the ones included in EFAS.

This work has been carried out within the framework of the European Flood Awareness System (EFAS) which is part of the Copernicus Emergency Management Services. We acknowledge the national hydrological services for providing observational data and ECMWF for the forecast data.