Variational data assimilation with the YAO platform for hydrological forecasting

In this study data assimilation based on variational assimilation was implemented with the HBV hydrological model using the YAO platform of University Pierre and Marie Curie (France). The principle of the variational assimilation is to consider the model state variables as control variables and optimise them by minimizing a cost function measuring the disagreement between observations and model simulations. The variational assimilation is used for the hydrological forecasting. In this case four state variables of the rainfall– runoff model HBV (those related to soil water content in the water balance tank and to water contents in rooting tanks) are considered as control variables. They were updated through the 4D-VAR procedure using daily discharge incoming information. The Serein basin in France was studied and a high level of forecasting accuracy was obtained with variational assimilation allowing flood anticipation.


INTRODUCTION
Floods represent one of the most destructive hazards to human lives and properties.To mitigate such a hazard, lots of research has focused on improving real-time monitoring and flood forecasting (Droegemeier et al. 2000).Numerous techniques were developed to improve hydrological forecasting, especially high flood prediction.Data assimilation was found to be one of the main tools to achieve this task (Seo et al. 2003, 2009, Ouachani et al. 2007, Lee et al. 2012).The 4D-Var method (variational data assimilation) is used in geophysics, applied to physically-based numerical models, in particular in meteorology (Schlatter 2000) and oceanography (Nodet 2007).It consists of estimating the control parameters of a direct numerical model (the hydrological model in our case), by minimizing a cost function quantifying the mismatching between the forecast values and actual observations.Gradient methods are adopted for the minimisation process.At University Paris VI, the research team of LOCEAN has developed a computation platform named YAO implementing the 4D-Var technique (Nardi et al. 2009).In this study, it is proposed to apply variational assimilation using the YAO framework to perform hydrological forecasting.Thus, the rainfall-runoff model HBV is adopted as a forecasting tool whose state variables are updated by 4D-Var to reduce the forecasting uncertainty.

The HBV Rainfall-Runoff Model
The HBV model is one of the most successful conceptual rainfall-runoff models that has been applied in more than 30 countries.In the current study, a lumped modelling is adopted.The main model outputs are daily mean flows (m 3 /s) as well as daily actual evapotranspiration (mm/day).The principal water balance components of the HBV model are snow accumulation and melt, actual evapotranspiration and infiltration evaluation, soil humidity evolution, and water transfer through soil (Fig. 1).A further description of the HBV model version used in this study can be found in Dakhlaoui and Bargaoui (2013).The model includes several state variables evolving continuously, which decide the flows values.They are mainly the soil moisture level (SW), the water level in the rooting tanks (STW1, STW2, and STW3) and the depth of the snow accumulated (SD).

Variational assimilation
Variational assimilation (4D-VAR) (Le Dimet et al. 1986) considers a physical phenomenon described in space by one, two or three dimensions and its time evolution.It thus requires the knowledge of a direct dynamical model M, which describes the time evolution of the physical phenomenon.M allows connecting the geophysical variables studied with observations.By varying some geophysical variables (control variables), assimilation seeks to infer the physical variables that led to the observation values.The basic idea is to determine the minimum of a cost function J that measures the misfits between the observations and the model estimations.Due to the complexity related to the nonlinearity of this function, the desired minimum is classically obtained by using gradient methods, which implies the use of the tangent linear and the adjoint models of M. The latter and the former are derived from the equations of the direct model M. The adjoint model estimates changes in the control variables in response to a disturbance of the output values calculated by M (here HBV equations).It is therefore necessary to proceed backwards to tangent linear calculations, which means using the transpose of the Jacobian matrix.When observations are available, the adjoint allows minimizing of function J, and finding the values of the control variables (in our case: SW, STW1, STW2 and STW3).

YAO
YAO provides a framework helping the implementation of the adjoint model using programming based on a general formalism decomposition of complex systems into a modular graph (Nardi et al. 2009, http://www.locean-ipsl.upmc.fr/~yao/).The graph is composed of modules connected by nodes and representing the numerical model.Each module is composed of an elementary function specific to the dynamic model, which is differentiable.YAO compiles and generates an executable that can compute the direct model M, the tangent linear model M and the adjoint model M t .An interface with a quasi Newton optimizer called M1QN3 (Gilbert and Le Marechal 1989) is used to minimize the cost function.YAO, using forward and backward integrations of M and its adjoint, yields the derivatives.

DATA
The variational assimilation was applied to predict the floods of the Serein (France).It is a tributary of the Seine (Fig. 2).It is mainly situated in a rural zone.It presents an elongated form with an area of 1120 km 2 .The streamflow gauge is situated at Chabelis and began to work in 1955.The daily mean flow is about 7.9 m 3 /s which is the equivalent of 255 mm/year.The annual mean rainfall is about 850 mm.

METHODOLOGY OF HYDROLOGICAL FORECASTING 4D-VAR
The 4D-Var was considered to assimilate outflows and to predict floods.Four model state variables are considered a priori as control variables: the soil moisture (SW) and the levels of water in the rooting tanks (STW1, STW2 and STW3) (Fig. 1).In fact, several studies proved the importance of such variables (Zehe et al. 2005, Lee et al. 2012).
Forecasting tests were performed under perfectly known rainfall.The control variables are updated every time step by the assimilation of the hydrological data of the n previous time steps (days).The forecasting is applied for the pth time step (day).Several combinations of assimilation windows width (n) and forecast lead time (p) were tested.
The effect of these decision variables on the quality of forecasting was studied.The Nash criterion (Nash and Sutcliffe 1970) was considered to evaluate the forecasting performance.In addition, we have evaluated the precision of the forecasting procedure to predict the peak flows.To that purpose, three objectives were targeted: reconstitute the occurrence, the amplitude and the timing of peaks.The first objective is quantified using a binary criterion.In effect, when the forecasting procedure generates a peak that matches to an observed one (regardless of the magnitude), this criterion takes the value "yes", and if no peak is generated this criterion takes the value "no".We note that this criterion is estimated with an objective manner by the observer (expert hydrologist).For the amplitude, we compare the amplitude of the matched observed and simulated peaks.We quantify the relative error between the observed and simulated peak flows.For the timing, we evaluate the delay (in days) between the observed and the simulated peak.The evaluation is assumed negative in the case of premature forecast and positive for delayed predicted peaks.

Calibration of HBV model
Firstly the HBV model was calibrated with the SCE-UA-KNN (Dakhlaoui et al. 2012).The calibration period was from January 1998 to December 2002.The validation period was from January 2008 to March 2010.The time steps of simulation and prediction were the day.The 14 model parameters (Fig. 1) were considered simultaneously in the calibration process.It is concluded that satisfactory model performances are achieved; the Nash-Sutcliffe criterion is about 0.85.For the validation period, the model conserves a high level of Nash criterion value (0.85).However, there is some difficulty for the model to reconstitute peak flows, which justify the potentiality for data assimilation to improve this situation.

Hydrological forecasting
The model parameters applied to run forecasting were those obtained by calibration in the previous section.Two periods were selected for the forecasting experiences, the first In Fig. 3, the assimilation period was considered equal to n = 3 days and the forecast lead time to p = 1 day.It results in an important improvement in the forecasting accuracy compared to the raw model.In effect, the Nash of the forecasted flow is increased to 0.93 (0.70 for the raw model).In addition, four of the five peak flows were detected by the forecasting procedure.Particularly the peak of 5 February 2009, which was not detected by the raw model, is predicted in the present experience (Table 1).We also note that the forecasted recession flows are underestimated compared to the observed ones.In effect, the recession is more rapid in forecasting than in realty.
For the second observation period, with a forecast lead time of n = 1 day and an assimilation period of p = 3 days (Fig. 4), the Nash of predicted flow is about 0.95.All peak flows are detected, but with a relative error of up to 12%.We also find a very important improvement of the  Occurrence: yes if the flood is simulated, no else, Amplitude: relative error of simulation of the peaks and Timing: the difference of the timing of the forecasted and observed peak.reconstitution of the most important peak flow, as well as the one ranked at second position; they were not detected without the updating process.However, a tendency to underestimate the recession time was monitored.In effect, the predicted recession segments are lower than the observed ones, especially for the second half of the observed period.
When the assimilation period is decreased to p = 2 days (n = 1 day), a spectacular improvement in prediction performance is noted (Fig. 5).The Nash of predicted flow reaches 0.98.We observed a quasi-perfect prediction of peak flow amplitude, timing and occurrence.The rising and recession segments of predicted flows match better to observations.
For the forecast lead time (p = 2 days), augmenting the assimilation window to n = 3 days, we note an important decline of the prediction accuracy (Table 2).In effect, the variational assimilation results in a lower Nash (= 0.81) than in the case without state variables updating (Nash = 0.87).
There is an important discordance between the predicted peak flows and the observed ones, as well as in amplitude, timing and occurrence.Although the variational assimilation could estimate the peaks amplitude of the most important floods, it presents a lag of 1 day in the timing of peaks.For the other secondary events, the variational assimilation has difficulty in catching up with the observations, and sometimes completely misses the observed.
In this series of tests with Serein, we note that the accuracy of prediction decreases along the observed period.Generally the assimilation improves forecasting compared to HBV model simulation (raw model).The forecast lead time of p = 1 day and the assimilation period n of two days give the best results.In fact, the time differences and similarities among the forecasts may be seen best in this case.For other combinations of assimilation windows and forecast lead times, we  can deduce an over forecast and an under forecast some time severe.The worst results were obtained with n = 3 days assimilation and p = 2 days (Table 2).

CONCLUSION
According to this experience, the 4D-Var method and YAO as a tool are powerful for developing assimilation procedures.The methodology was to select the most sensitive state variables for updating with data assimilation.Several assimilation periods and forecast lead times were evaluated.
In the case of the Serein (1120 km 2 ) using the daily time step, soil moisture content (SW) and rooting tanks levels (STW1, STW2 and STW3) were adopted as control variables (four variables).A high level of forecasting accuracy was obtained with either 1 or 2 days anticipation of floods.The assimilation periods were 2 or 3 days.The forecasting accuracy was also evaluated with respect to both occurrence and timing of flow peaks.Of course, more and more experiments on other basins have to be achieved to confirm the success of the present experiments.

Fig. 2
Fig. 2 Location of the Serein catchment in the Seine basin.
from 17 January 2009 to 27 February 2009 and the second from 9 March 2008 to 14 May 2008.The former contains five peaks, while the latter contains nine peaks, the highest values occurred on 26 January 2009 and on 24 March 2008, respectively.Figures 3, 4 and 5 show examples of hydrographs of the variational assimilation forecasts against the observed flows and the simulated runoff using the HBV model (called raw model simulation).

Fig. 3
Fig. 3 Evolution of streamflow predicted (fine dished line) compared to those obtained with HBV simulation (bold dished line) and observed streamflow (continuous line) at the outlet of the Serein for the observed period 17 January 2009 to 27 February 2009 with 3 days assimilation and 1 day forecast lead time.SW, STW1, STW2 and STW3 are updated with assimilation (HBV without assimilation Nash = 0.70, Assimilated flow Nash = 0.93.Table 1 Comparison of peak flows (occurrence, amplitude and timing) of stream flow forecasted with 3 assimilation and 1 day forecast lead time, to those obtained with HBV simulation with the observed stream flow at the outlet of the Serein, for the observed period of 17 January 2009 to 4 March 2009.

Fig. 4 Fig. 5
Fig. 4 Evolution of streamflow predicted (fine dished line) compared to those obtained with HBV simulation (bold dished line) and observed stream flow (continuous line) at the outlet of the Serein for the observed period 9 March 2008-14 May 2008, with 3 days assimilation and 1 day forecast lead time.SW, STW1, STW2 and STW3 are updated with assimilation.HBV without assimilation Nash = 0.87, Assimilated flow Nash = 0.95.

Table 2
Performance of forecasting with different combinations of assimilation windows and forecast lead times, for the Serein.