the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatially distributed calibration of a hydrological model with variational optimization constrained by physiographic maps for flash flood forecasting in France
Maxime JayAllemand
Julie Demargne
PierreAndré Garambois
Pierre Javelle
Igor Gejadze
François Colleoni
Didier Organde
Patrick Arnaud
Catherine Fouchier
This contribution presents a regionalization approach to estimate spatially distributed hydrologic parameters based on: (i) the SMASH (Spatially distributed Modelling and ASsimilation for Hydrology) hydrological modeling and assimilation platform (JayAllemand, 2020; JayAllemand et al., 2020) underlying the French national flash flood forecasting system Vigicrues Flash (Javelle et al., 2019); (ii) the variational assimilation algorithm from (JayAllemand et al., 2020), adapted to high dimensional inverse problems; (iii) spatial constraints added to the optimization problem, based on masks derived from physiographic maps (e.g., land cover, terrain slope); (iv) multisite global optimization, which targets multiple independent watersheds. This method gives a regional estimation of the spatially distributed parameters over the whole modeled area. This study uses a distributed rainfallrunoff model with 4 parameters to calibrate, with a spatial resolution of 1×1 km^{2} and a 15 min time step. Performances of the calibrated hydrological model and the parameters robustness are evaluated on two French study areas with 20 catchments in each, in spatiotemporal extrapolation based on crossvalidation experiments over a 12year period. Several spatial regularization strategies are tested to better constrain the high dimensional optimization problem. The model parameters are calibrated based on the NashSutcliffe Efficiency (NSE) computed for multiple calibration basins in the study area. Results are discussed based on the NashSutcliffe Efficiency and the KlingGupta Efficiency criteria obtained on calibration and validation catchments for two subperiods of 6 years. Further work aims to improve the global search of prior parameter sets and to better balance the adjoint sensitivity with respect to the spatial constraints resolution and catchment characteristics. This will ensure a better consistency of simulated fluxes variabilities and enhance the applicability of the regionalization method at higher spatial scales and over larger domains.
 Article
(4233 KB)  Fulltext XML
 BibTeX
 EndNote
UPH6; UPH19; UPH20; SDG13; distributed modeling; conceptualization; parameter optimization; ungauged basins; flash floods
The estimation of storage and fluxes in surface hydrology is an essential scientific question related to major socioeconomic issues as extreme floods and droughts forecasting, especially with the undergoing climate change. In order to better predict flash floods and reduce their potentially devastating impact, warning systems have been developed or are currently under development (Gourley et al., 2017; Javelle et al., 2016). In this context, advanced spatially distributed modeling tools are critically needed to generate reliable and skillful local forecasts. These models take into account the spatial variability of the catchment properties and of the forcing inputs (e.g., rainfall) to produce discharge predictions at ungauged locations. Nevertheless, hydrological modeling remains a challenging task because of limited observations of physical processes and modeling uncertainties, and internal storage fluxes are tinged with uncertainties.
Whatever their status and complexity, hydrological models are most often calibrated and validated using integrative discharge time series at the outlet of a catchment (e.g. Sebben et al., 2013). However, due to a potentially significant number of cells or subcatchments involved in distributed models, the calibration process has to deal with overparameterization and equifinality (uniqueness) issues. In particular, given the spatial sparsity of constraining discharge observations, hydrological modeling is faced with the challenge of producing predictions at ungauged locations based on the regionalization of the hydrological model parameters. Despite the overparameterization problem in spatially distributed modeling, the contribution of e.g. JayAllemand et al. (2020) and JayAllemand (2020) showed promising results for estimating the spatial variability of the distributed parameters within a given catchment using only downstream discharge observations. However, providing better spatial constrains on the estimated parameters patterns, within and outside the calibration catchments in a regionalization perspective, remains a challenge.
Different regionalization strategies have been proposed in hydrology. Some methods search for relations between the model parameters and physical descriptors (Seibert, 1999). Others methods explore regionalization schemes based on the spatial or physical proximity (e.g. Odry and Arnaud, 2017; Parajka et al., 2005). These methods have many requirements: (1) reliable parameters have to be identified on gauged catchment; (2) hydrological relations between gauged and ungauged catchments must exist; (3) model parameters should have a physical meaning. Unfortunately, these requirements are not always satisfied. To better constrain regionalization over large domains, promising regional calibration results of Samaniego et al. (2010), Poncelet (2016), Mizukami et al. (2017), and Beck et al. (2020) are based on the Multiscale Parameter Regionalization (MPR) upscaling and preregionalization scheme of Samaniego et al. (2010), which enables to account for various physiographic and climatic descriptors. It consists in imposing a mapping between fine scale descriptor maps and model parameter maps prior to regional optimization. Nevertheless, in all those studies, the mapping parameters are optimized as spatially uniform with classical optimization algorithms limited to low dimensional inverse problems.
In a regionalization perspective, we extend the distributed calibration approach developped by JayAllemand et al. (2020), which is adapted to high dimensional hydrological inverse problems. Our regional calibration method uses the following elements: (1) a distributed hydrological model; (2) a variational algorithm; (3) the adjoint of the forward model; (4) physical descriptors to constrain the spatial variability of the parameters; (5) a regularization function, required for illposed inverse problems (Tikhonov, 1963), adapted to constrain the distributed hydrological model with physical descriptors. Finally, the major difference with others regional calibration approaches lies in the fact that no prior guess based on hypothetical physical meanings of the parameters is required. The descriptors are only used to define classified physiographic maps to constrain the parameter patterns through the control setting or the regularization function.
SMASH is a computational software framework dedicated to Spatially distributed Modeling and data ASsimilation for Hydrology at high resolution, capable to tackle high dimensional inverse problems with variational data assimilation. It underlies the French flash flood warning system called Vigicrues Flash (Javelle et al., 2019).
2.1 SMASH Model
Let Ω⊂ℝ^{2} be a 2D spatial domain, containing single to multipe nested and nonnested catchments, and t>0 be the physical time. A regular lattice 𝒯_{Ω} covers Ω and D(x) is the drainage plan obtained from terrain elevation processing, with the only condition that a unique point in Ω has the highest drainage area, i.e. the outlet of the largest catchment in Ω. The hydrological model is a dynamic operator ℳ mapping the observed input fields of rainfall and evapotranspiration, denoted $\mathit{P}\left(x,{t}^{\prime}\right)$ and $\mathit{E}\left(x,{t}^{\prime}\right)$, $\forall \left(x,{t}^{\prime}\right)\in \mathrm{\Omega}\times (\mathrm{0},t)$ onto the discharge field Q(x,t) such that:
with h(x,t), the N_{s}dimensional vector of model states (2D fields), and θ, the N_{p}dimensional vector of model parameters (2D fields); θ is also called the control vector in the optimization context.
In this study, we use a parsimonious 4parameters gridded GRlike hydrological model schematized in Fig. 1. For a given pixel i of coordinates x∈Ω, the proposed model includes three reservoirs 𝒫, 𝒯_{r} and 𝒯_{l} of capacity c_{p}, c_{tr} and c_{tl}, to simulate respectively the production of runoff and its transfer within a cell (with two transfer reservoirs). Their stages are respectively denoted h_{p}, h_{tr} and h_{tl}. Given known flow directions, classically obtained from a Digital Elevation Model, the celltocell routing is done with a nonlinear routing reservoir ℛ of capacity c_{r}. The spatial resolution is set to Δx=1 km and the simulation time step to Δt=15 min corresponding to the spacetime resolution of input rainfall and evaporation data, which are assumed to be constant over a given time step Δt. The flow operators, mostly consisting in first order ordinary differential equations, are analytically integrated over a time step, enabling a simple computation of the forward model defined in Eq. (1).
2.2 Calibration algorithm
In order to better constrain discharge estimates at ungauged locations, the calibration aims to estimate spatially distributed parameters within the calibration catchments as well as outside them. Hence, the parameter vector $\mathit{\theta}\left(x\right)\equiv {\left({c}_{\mathrm{p}}\left(x\right),{c}_{\mathrm{tr}}\left(x\right),{c}_{\mathrm{r}}\left(x\right),{c}_{\mathrm{tl}}\left(x\right)\right)}^{\mathrm{T}}$ is of size N_{c}×N_{x}, where N_{c} denotes the number of parameters fields and N_{x} the total number of cells in Omega Ω, including cells in ungauged areas. Considering tens of cells or more over a given hydrological domain Ω with discharge observations being available only on a limited number of points (i.e. outlets of the gauged catchments), the regional calibration of a spatially distributed θ is a high dimensional and difficult inverse problem.
The adjointbased variational calibration algorithm developed by e.g. (JayAllemand, 2020; JayAllemand et al., 2020) and adapted to such hydrological inverse problems, is used in this study and summarized here and in Fig. 1. Given the simulated and observed discharges at gauged cells ${x}_{k}\in \mathrm{\Omega},k\in \mathrm{1},..,{N}_{\mathrm{g}}$, respectively denoted Q_{k}(t) and ${Q}_{k}^{*}\left(t\right)$, we define the objective function as:
where the observation cost function is ${J}_{\mathrm{obs}}={\sum}_{k=\mathrm{1}}^{{N}_{\mathrm{g}}}{j}_{k}^{*}$. In this study, ${j}_{k}^{*}$ is classically defined as 1− NSE, with NSE the NashSutcliffe efficiency (Nash and Sutcliffe, 1970), a quadratic metric that measures the misfit between simulated and observed discharges. Note that the simulated discharge depends on the control vector θ via the hydrological model (Eq. 1). The second term in Eq. (2) is a regularization term weighted by α. Given the illposedness of the hydrological calibration problem, classified maps of different physiographic descriptors are introduced to spatially constrain the variability of model parameters fields. For each parameter field of θ, one classified physiographic map is considered to either reduce the control vector into semidistributed controls or compute a regularization term by class as detailed hereafter.
In order to improve the obtained hydrologic parameter patterns, an additional regularization term is proposed to penalize parameter variability within each physiographic class. Consider for each of the N_{c}=4 parameters fields ${\mathit{\theta}}_{\mathrm{c}\in [\mathrm{1}..{N}_{\mathrm{c}}]}\left(x\right)\in \mathit{\theta}\left(x\right)$, a given physiographic map defining a partition of the catchment such that ${\mathrm{\Omega}}_{\mathrm{c}}={\cup}_{k=\mathrm{1}}^{{n}_{\mathrm{p},\mathrm{c}}}{\mathrm{\Omega}}_{\mathrm{c},k}$ with n_{p,c} the number of classes in the spatial partition of parameter c. We introduce the following regularization term:
where d_{i,j} is the Euclidean distance between cells i and j. This function, inspired by the classic Tikhonov regularization (Tikhonov, 1963), enables a consistent regularization of the parameters spatial variability within the spatial partitioning Ω_{c} of each parameter. The weight α is chosen with the Lcurve method (Hansen and O'Leary, 1993).
Given the hydrological model defined in Eq. (1), an optimal estimate $\widehat{\mathit{\theta}}$ of the model parameter set is obtained from the condition:
The optimization is performed with the LBFGSB algorithm adapted to high dimension (Zhu et al., 1994). This algorithm requires the gradient of the cost function with respect to the sought parameters, ∇_{θ}J, which is obtained by solving the adjoint model. The adjoint model has been generated with the automatic differentiation engine TAPENADE (Hascoet and Pascual, 2013) applied to the SMASH source code, which includes the new modeling components and regularization scheme, and has been validated with a standard gradient test. The first guess value $\overline{\mathit{\theta}}$ needs to be defined as a starting point for the optimization process with the regularization term. $\overline{\mathit{\theta}}$ is defined as a spatially uniform global optimum determined with a simple global minimization algorithm from a uniform parameter set, ${\overline{\mathit{\theta}}}^{*}$ (JayAllemand et al., 2020). The steepest descent method summarized in (Edijatno, 1991) has been used in this study to determine this uniform first guess value. Note that the semidistributed calibration of a semidistributed conceptual model has been investigated in De Lavenne et al. (2019), using a sequential and low dimensional calibration approach involving a regularization strategy accounting for deviation to a priori parameter sets determined with catchment proximity. In the present work, a regularization term penalizing the spatial variability of parameters within physiographic catchment partitioning is introduced in a variational data assimilation algorithm adapted to high dimensional inverse problems; the cost function is global in space and time.
In order to investigate the effects of the proposed regularization term, the following calibration strategies are compared:

a spatially uniform calibration (denoted U), where the control vector considered constant in space (hence ${\mathit{\theta}}_{\mathrm{U}}\left(x\right)\equiv \overline{\mathit{\theta}}$ reduced to 4 unknowns) is sought with the simple steepest descent algorithm from e.g. (Edijatno, 1991); this uniform calibrated parameter set ${\widehat{\mathit{\theta}}}_{\mathrm{U}}$ is used as the first guess value in the three subsequent calibration processes;

a distributed calibration (denoted D), with a spatially distributed control θ_{D}(x), i.e. with free parameters in each cell within the gauged catchments used for calibration; the first guess values (from the uniform calibration) give parameter estimates for all grid cells outside of the gauged catchments;

a semidistributed calibration (denoted SD), with a semidistributed control θ_{SD}(x) spatially constrained by classified physiographic maps over the whole domain, hence with one spatially uniform free parameter for each class of the physiographic mask relative to the model parameter;

a distributedregularized calibration (denoted DR), with a fully distributed control θ_{DR}(x) since the physiographic regularization from Eq. (3) (based on the same physiographic masks as in the SD calibration) is used in this case as a weak regionalization constrain over the whole domain.
In fact, in the optimization process, the observation cost function is only sensitive to the parameter values of the cells within the gauged catchments used for calibration. However, the regionalization effect stems from the proposed classified physiographic maps either through the semidistributed control setup for the SD case or from the regularization term in the DR case.
The calibration strategies are evaluated over two study areas with 20 gauged basins in each (see Fig. 2): (i) a Northern area in the Burgundy region, (ii) a Southern area in the Ardeche region. These 40 gauged basins, with catchment areas ranging from 25 to 768 km^{2}, have been split up in 2 subsets for crossvalidation purposes: 28 basins for calibration and 12 basins for validation (5 nested and 7 neighboring catchments). The selection of catchments is based on the availability of long time series with high quality of observed flow and limited anthropogenic impacts. The choice of the localization of the calibration/validation and nested/disjoint catchments is arbitrary.
The SMASH model runs on a 1×1 km^{2} grid resolution at a 15 min time step forced by: (i) observed rainfall grids based on hourly ANTILOPE J+1 radargauge rainfall reanalysis (Laurantin, 2013; Champeaux et al., 2009), which have been disaggregated using the temporal distribution of 15 min PANTHERE radargauge rainfall estimates available in realtime (both gridded products from MétéoFrance); (ii) PET estimates based on the Oudin formula and the temperature data from SAFRAN reanalysis produced by MétéoFrance on a 8×8 km^{2}.
Several combinations of physiographic descriptors were tested to spatially constrain the parameter calibration using land use, local slope, bedrock type and drainage areas. Results presented in this paper consider: (i) for the c_{p} production parameter, a 1 km^{2} land use gridded mask with 3 classes derived from CORINE Land Cover (2012 version, https://land.copernicus.eu/paneuropean/corinelandcover, last access: 9 May 2023) (the 3 classes including artificial areas, open spaces with little or no vegetation and water bodies for class 1, agricultural areas for class 2, and forests for class 3); (ii) for the 3 other transfer and routing parameters (c_{tr}, c_{tl} and c_{r}), a 1 km^{2} local slope grid mask with 4 classes (using the threshold values of 0.025, 0.5 and 1.8° for classes 1 to 4) from the Copernicus database (http://land.copernicus.eu/insitu/eudemderivedproducts/slope, last access: 9 May 2023). The numbers of classes was chosen so that each catchment is covered by at least two classes and both descriptors have more or less the same number of classes (3 and 4). We are aware that this choice affects the size of the control of the (SD) case.
The number of free parameters depends on the study area (Northern and Southern) and the optimization scheme. For the distributed case (D), the number of free parameters is equal to 12 328 (Northern) and 14 400 (Southern), which corresponds to the number of cells inside the gauged catchments times the number of hydrological operators in the forward model. Cells outside the gauged area are not sensitive to the cost function and are therefore not included in the optimization process. For the distributedregularized case (DR), the number of free parameters is equal to 45 760 (Northern) and 40 000 (Southern), which corresponds to the total number of cells of the domain times the number of hydrological operators in the forward model. Thanks to the regularization function, all domain cells are sensitive to the cost function and are thus included in the optimization process. For the semidistributed (SD) case, the use of classified physiographic maps for the reduction of the control leads to 15 free parameters, corresponding to the total number of descriptors classes used for all operators. For the uniform (U) case, the control contains only 4 parameters. Table 1 gives a summary description of the configurations of the four calibration experiments.
The parameter bounds for the uniform (U) and semidistributed (SD) calibration are defined as follows: c_{p} and c_{tr} varying between 20 and 1000 mm, c_{tl} between 20 and 5000 mm, and c_{r} between 0.1 and 100 min. For the distributed (D) and distributedregularized (DR) calibrations, the first guess value for each parameter has been used to define the upper bound as ${\mathit{\theta}}_{max}=\mathit{\theta}\left({x}_{\mathrm{b}}\right)+\mathrm{3}\times \mathit{\theta}\left({x}_{\mathrm{b}}\right)$, the lower bound being equal to 1.
The weight of the regularization, α, was estimated using the Lcurve method (Hansen and O'Leary, 1993). The chosen α value varies in the interval of $[\mathrm{7.0},\mathrm{20}.]$ depending on the experiments. It is interesting to mention that, at the end of the optimization, the relative weights $\frac{\mathit{\alpha}{J}_{\mathrm{reg}}\left(\widehat{\mathit{\theta}}\right)}{{J}_{\mathrm{obs}}\left(\widehat{\mathit{\theta}}\right)}$ remain within the range of [9.0,12.0].
To focus the parameter calibration on high flow conditions, the optimization criterion (NSE) is calculated using only observed flow exceeding one fifth of the flood quantile relative to the 2year return period. The evaluation scores are all computed with the same thresholding strategy, which leads to lower efficiency score values but a better description of the similarity between observed and simulated discharges for high flow conditions.
The evaluation of the model calibrations consists in analysing the model predictive performance (referred as MPP) using data not involved in calibration (JayAllemand et al., 2020). Therefore, the full set of observations ${Q}^{*}k\left(t\right),k=\mathrm{1},\mathrm{\dots},\mathrm{Ng},t\in (\mathrm{0},T)$ is divided into two complementary subsets: a calibration subset and a validation subset. Since Q^{*} depends on k (defining the spatial distribution of sensors) and t, we distinguish the temporal validation and spatiotemporal validation, following the split sample test defined by Klemeš (1986). Two 6year subperiods are defined: P1 from 1 August 2007 to 31 July 2013, and P2 from 1 August 2013 to 31 July 2019. Each subperiod (P1 or P2) is considered as the calibration period while the other period is the validation period. A model warmup of one year (from August to July preceeding the calibration period) is performed before starting the simulations, one year being long enough to obtain stable model states for these catchments not strongly impacted by snow conditions.
If data from a station is used in calibration, the corresponding catchment is labelled as “calibration catchment”; otherwise it is a “validation catchment”. Both the calibration quality and the MPP in validation are measured using the NSE criterion and the Kling–Gupta efficiency (KGE) criterion (Gupta et al., 2009). The NSE and KGE metrics are computed for high flow conditions only, on the calibration and validation subperiods. The gain in the NSE score is computed for each individual basin as a skill score to evaluate the relative improvement (in terms of NSE) of a given calibration in reference to the benchmark calibration (e.g., the uniform (U) case).
4.1 Model predictive performances analysis
The distribution of the NSE and KGE values for all 28 calibration catchments (from both study areas) on the calibration and validation periods are plotted on Fig. 3, while results for the 12 validation catchments on the validation period (for spatiotemporal validation) are plotted on Fig. 4. Boxandwhisker plots represent the quartile values, the median value being indicated by the bar within the box and the whiskers extending from the smallest to the largest value.
Model performances at the calibration stations for the calibration subperiods (left plot of Fig. 3): the best model performances are obtained for the distributed and distributedregularized calibrations, for which the NSE (and KGE) median values (for high flow conditions) are equal to respectively 0.8 (0.83) and 0.77 (0.83). The (fully) distributed calibration (D and DR) using the variational algorithm significantly improves the NSE and KGE scores at the calibration stations by 0.1 point on average compared to uniform calibration U. The semidistributed calibration shows intermediate model performances due to the reduced control vector in space. The more flexible distributedregularized calibration strategy (i.e., with more degrees of freedom) leads to better regional model performances in calibration compared to the semidistributed method.
Model performances at the calibration stations for the validation subperiods (right plot of Fig. 3): the uniform parameters (U) strongly degrade the model predictive performances (MPP) from the calibration to the validation period, with a median NSE score reduced from 0.65 to 0.44. MPP are less degraded for the other experiments. Both fully distributed approaches (D and DR) outperform the semidistributed (SD) calibration: for example, the distributedregularized (DR) calibration improves the NSE score for 37 out of the 56 cases, with a median gain of 12 % (on all 56 cases). While these two approaches are overparameterized, the distributed (D) parameter set obtains better scores in temporal validation at the calibration stations: in comparison with the SD calibration, the NSE score is improved for 38 out of the 56 cases, with a median gain in NSE of 14 % (on all 56 cases).
Model predictive performances at the validation stations for the validation period (Fig. 4): all four experiments degrade MPP in spatiotemporal validation: for example, the uniform parameter set (U) obtains a median NSE score of 0.32 on all 24 cases. The distributed calibration (D) uses the uniform parameter set outside of the calibration catchments, thus obtaining a similar median NSE score (0.33 on all 24 cases), the improvement being limited to the 5 validation catchments fully or partially nested within a calibration catchment (10 cases for the 2 validation subperiods). The semidistributed (SD) and distributedregularized (DR) calibrations slightly improve the MPP for 15 and 14 cases (over the 24 cases) in comparison to the uniform calibration (U), with a median NSE gain of 11 % and 4 % respectively. In comparison with the D calibration, the DR calibration leads to a NSE improvement for 5 of the 10 nested cases. The spatial constraints applied for the DR and SD experiments allow to estimate a spatially variable parameter set outside the calibration catchments, which should be preferred to the uniform parameter set (U).
4.2 Analysis of the parameter spatialisation
Calibrated parameter maps for period P1 and for the Northern study area are presented in Fig. 5, with spatial variability either inside the calibration (gauged) catchments for the distributed calibration case (D) or over the whole domain for the calibration cases SD and DR (thus for any ungauged catchments). The considered calibration strategies, involving different control and/or regularization setups in an equifinality context led to contrasted inferred parameter fields that are analyzed here.
The D and DR calibration experiments led to similar parameter maps since, for these methods, the initial first guess values play a key role during the optimization. Indeed the median and average values computed on the overall domain remain close to the first guess values. The estimation of the first guess is therefore critical to constraint the problem.
To globally analyse the differences between inferred parameter sets θ_{SD} and θ_{DR} obtained for periods P1 and P2 and catchment sets North and South, the relative difference to the uniform first guess θ_{U} is introduced: $\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{SD}}^{{\mathrm{\Omega}}_{k},p,z}=\left({\widehat{\mathit{\theta}}}_{\mathrm{SD}}^{{\mathrm{\Omega}}_{k},p,z}{{\mathit{\theta}}^{*}}_{\mathrm{U}})\right/{{\mathit{\theta}}^{*}}_{\mathrm{U}}$ and $\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{DR}}^{{\mathrm{\Omega}}_{k},p,z}=\left({\widehat{\mathit{\theta}}}_{\mathrm{DR}}^{{\mathrm{\Omega}}_{k},p,z}{{\mathit{\theta}}^{*}}_{\mathrm{U}})\right/{{\mathit{\theta}}^{*}}_{\mathrm{U}}$; where ${\widehat{\mathit{\theta}}}_{\mathrm{SD}}^{{\mathrm{\Omega}}_{k},p,z}$ and ${\widehat{\mathit{\theta}}}_{\mathrm{DR}}^{{\mathrm{\Omega}}_{k},p,z}$ denote the inferred parameters for each spatial partitioning ${\mathrm{\Omega}}_{k}=\mathit{\{}{\mathrm{\Omega}}_{\mathrm{1}},\mathrm{\dots},{\mathrm{\Omega}}_{{N}_{k}}\mathit{\}}$, period p={P1, P2} and study zone z={North, South}, for the semidistributed (SD) and distributedregularized (DR) experiments. The corresponding global spatial averages over each zone, $\stackrel{\mathrm{\u203e}}{\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{SD}}}$ and $\stackrel{\mathrm{\u203e}}{\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{DR}}}$, and spatial variances, σ(Δθ_{SD}) and σ(Δθ_{DR}), are then computed (see Table 2). The comparison of the averaged relative differences $\stackrel{\mathrm{\u203e}}{\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{SD}}}>\stackrel{\mathrm{\u203e}}{\mathrm{\Delta}{\mathit{\theta}}_{\mathrm{DR}}}$ (being equal to 2.06 and 0.24) indicate that the inferred SD parameter set ${\widehat{\mathit{\theta}}}_{\mathrm{SD}}$ strongly deviates from the first guess while the DR ${\widehat{\mathit{\theta}}}_{\mathrm{DR}}$ parameters with physiographic reguarization stay much closer. The comparison of the spatial variances σΔθ_{SD}>σΔθ_{DR} (being equal to 74.43 and 0.21) indicates a much higher variablility for the semidistributed calibration. These results can be seen in terms of deviation from the first guess and spatial variability of the inferred parameters on Fig. 3. The parameter sets from the semidistributed calibration experiment (SD) are the most variable within the parameter bounds (i.e. higher σ values for the SD parameter maps). The resulting average and median values diverge from the first guess value. The variances of the parameters estimated on the overall domain are also higher in average.
When the parameter optimization targets multiple disjoint catchments, the uniform and semidistributed calibration problems are underparameterized. This could lead to unexpected optimal parameter sets when catchments exhibit different types of hydrological behavior. This issue occurred in the Southern study area, where the optimal first guess (${\mathit{\theta}}_{\mathrm{U}}=\stackrel{\mathrm{\u203e}}{\mathit{\theta}}$ being spatially uniform) for the slow transfer reservoir c_{tl} is equal to 2793 mm for P2 and to 281 mm for P1. For period P2, the semidistributed calibration (SD) led to an average parameter value $\stackrel{\mathrm{\u203e}}{{c}_{\mathrm{tl}}}$ equal to 283 mm; for the distributed calibration, the optimal parameter $\stackrel{\mathrm{\u203e}}{{c}_{\mathrm{tl}}}$ could be lower than 500 mm for one catchment and larger than 3000 mm for two others. The uniform and semidistributed optimization could be driven globally by the performances reached at only one or a few catchments, whereas data quality issues and/or model deficiencies could cause a performance loss for these catchments. Interestingly, the distributed calibration with physiographic regularization (DR) leads to spatially smoother inferred parameter fields.
This contribution presented a novel method to regionally calibrate the parameters of a distributed hydrological model using a variational algorithm. This methods relies on classified physiographic map(s) to impose spatial patterns to the parameters, either strongly (SD) via the definition of the control vector or weakly (D) via a regularization term added to the objective function. Both calibration strategies led to similar performances in spatiotemporal validation whereas the distributedregularized method improved the efficiency scores for the calibration catchments. Both strategies have their own strengths and weaknesses.
The (SD) strategy is the easiest approach to drastically reduce the size of the control vector. However, like the uniform calibration, it is very sensitive to the multisite optimization function and local model deficiencies and data errors could potentially impact the calibration over the whole area. Moreover, this strategy is very dependent on the choice of the spatial partitioning (i.e, the physiographic descriptors and their number of classes), which should be further explored. It could be interesting to test a combination of these strategies, using the output of the (SD) method as a first guess for the (DR) approach. Finally as the (SD) calibration works similarly to a special case of the MPR approach, we plan to extend this work via the more flexible MPR framework. The fully distributed calibration approach using the physiographic regularization strategy (DR case) showed promising results since: (1) it maximises the model performances at the control stations (MPP similar to the fully distributed approach without regularization); (2) it gives an estimate of the parameters values for the overall area, thus for any ungauged catchments. These regional parameter estimates help to improve the discharge predictions at the pseudoungauged validation catchments (which were not used for calibration). One drawback could be the huge size of the control vector. However, the variational optimization using the adjoint model is an efficient method to optimize the parameter vector and the equifinality does not exclusively stem from its size (JayAllemand et al., 2020). In our opinion, the distributedregularized approach is therefore more flexible and adequate for calibrating the parameters of a distributed hydrological model than the (SD) method. We believe that the regularization term can be further enhanced.
The spatiotemporal robustness of the calibrated parameters for (SD) and (DR) approach, is still to be improved: parameter maps are very different between the two calibration subperiods and the MPP at the validation stations are relatively poor compared to the MPP for the calibration stations. This lack of robustness has several causes. First the conceptual model includes parameters with limited physical meaning. Secondly, the optimization problem is overparameterized for the (DR) approach and the first guess values, which play a key role during the optimization, have to be improved using physical priors for example. Last, the lower calibration performance at validation catchments indicates a need to improve the model structure versatility to better adapt to different hydrologic behaviors.
Planned future work will focus on testing other regularizations with a physiographic distance metric and introducing hydrological units, as well as testing the forward model of upscaling and regionalization transfer functions of the MPR approach to upgrade the (SD) strategy. Moreover, a sensivity analysis during the first decisive iterations of the LBFGSB descent could provide valuable information about the calibration process and its driving components. Finally, a statistical and spatial analysis of the optimized parameters should be performed to search for relations between model parameters and physiographic descriptors and select the most adequate descriptors.
SMASH is planned to be released as an opensource software. A public online documentation has already been released at https://smash.recover.inrae.fr/index.html (Arnaud et al., 2023). Data sets can be provided on demand.
Authors performed the numerical simulations, analysis of the results and development of the SMASH model; they participated to the data and results analysis and contributed to the writing and reviewing of the article.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the special issue “IAHS2022 – Hydrological sciences in the Anthropocene: Variability and change across space, time, extremes, and interfaces”. It is a result of the XIth Scientific Assembly of the International Association of Hydrological Sciences (IAHS 2022), Montpellier, France, 29 May–3 June 2022.
We acknowledge all the reviewers for their reviews and comments, which helped to improve the article. We thank all the partners who provided data and computational ressources.
This paper was edited by Christophe Cudennec and reviewed by Alban de Lavenne and one anonymous referee.
Arnaud, P., Colleoni, F., Demargne, J., Ettalbi, M., Folton, N., Fouchier, C., Garambois, P., Gejadze, I., Godet, J., Haruna, A., Huynh Ngo Nghi, T., Javelle, P., JayAllemand, M., Organde, D., Paluszkiewicz, M., Pujol, K., Renard, B., Vigoureux, S., and Villenave, L.: The SMASH platform: Spatially distributed Modelling and ASsimilation for Hydrology, https://smash.recover.inrae.fr/index.html (last access: 12 May 2023), 2023. a
Beck, H. E., Pan, M., Lin, P., Seibert, J., van Dijk, A. I. J. M., and Wood, E. F.: Global Fully Distributed Parameter Regionalization Based on Observed Streamflow From 4,229 Headwater Catchments, J. Geophys. Res.Atmos., 125, e2019JD031485, https://doi.org/10.1029/2019JD031485, 2020. a
Champeaux, J.L., Dupuy, P., Laurantin, O., Soulan, I., Tabary, P., and Soubeyroux, J.M.: Les mesures de précipitations et l'estimation des lames d'eau à MétéoFrance: état de l'art et perspectives, LHB, 5, 28–34, https://doi.org/10.1051/lhb/2009052, 2009. a
De Lavenne, A., Andréassian, V., Thirel, G., Ramos, M.H., and Perrin, C.: . A regularization approach to improve the sequential calibration of a semidistributed hydrological model, Water Resour. Res., 55, 8821–8839, https://doi.org/10.1029/2018WR024266, 2019. a
Edijatno: Mise au point d'un modèle élémentaire pluiedébit au pas de temps journalier, PhD thesis, Université Louis Pasteur, ENGEES, Cemagref Antony, France, Strasbourg, 242 pp., https://webgr.inrae.fr/wpcontent/uploads/2012/07/1991EDIJATNOTHESE.pdf (last access: 9 May 2023), 1991. a, b
Gourley, J. J., Flamig, Z. L., Vergara, H., Kirstetter, P.E., Clark, R. A., Argyle, E., Arthur, A., Martinaitis, S., Terti, G., Erlingis, J. M., Hong, Y., and Howard, K. W.: The FLASH Project: Improving the Tools for Flash Flood Monitoring and Prediction across the United States, B. Am. Meteorol. Soc., 98, 361–372, https://doi.org/10.1175/BAMSD1500247.1, 2017. a
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, 2009. a
Hansen, P. C. and O'Leary, D. P.: The use of the Lcurve in the regularization of discrete illposed problems, SIAM J. Sci. Comput., 14, 1487–1503, 1993. a, b
Hascoet, L. and Pascual, V.: The Tapenade Automatic Differentiation tool: principles, model, and specification, ACM T. Math. Softw., 39, p. 43, https://doi.org/10.1145/2450153.2450158, 2013. a
Javelle, P., Organde, D., Demargne, J., SaintMartin, C., de SaintAubin, C., Garandeau, L., and Janet, B.: Setting up a French national flash flood warning system for ungauged catchments based on the AIGA method, in: 3rd European Conference on Flood Risk Management FLOODrisk 2016, vol. 7, p. 11, https://doi.org/10.1051/e3sconf/20160718010, 2016. a
Javelle, P., SaintMartin, C., Garandeau, L., and Janet, B.: Flash flood warnings: Recent achievements in France with the national Vigicrues Flash system, UNDRR GAR, https://www.undrr.org/publication/flashfloodwarningsrecentachievementsfrancenationalvigicruesflashsystem (last access: 9 May 2023), 2019. a, b
JayAllemand, M.: Estimation variationnelle des paramètres d'un modèle hydrologique, PhD thesis, Universite d'AixMarseille, https://www.theses.fr/2020AIXM0400/document (last access: 9 May 2023), 2020. a, b, c
JayAllemand, M., Javelle, P., Gejadze, I., Arnaud, P., Malaterre, P.O., Fine, J.A., and Organde, D.: On the potential of variational calibration for a fully distributed hydrological model: application on a Mediterranean catchment, Hydrol. Earth Syst. Sci., 24, 5519–5538, https://doi.org/10.5194/hess2455192020, 2020. a, b, c, d, e, f, g, h
Klemeš, V.: Operational testing of hydrological simulation models, Hydrolog. Sci. J., 31, 13–24, 1986. a
Laurantin, O.: ANTILOPE: hourly rainfall analysis over France merging radar and rain gauges data, in: Proceedings of the 11th International Precipitation Conference, edited by: Leijnse, H. and Uijlenhoet, R., KNMI, EdeWageningen, the Netherlands, 30 June to 3 July 2013. a
Mizukami, N., Clark, M. P., Newman, A. J., Wood, A. W., Gutmann, E. D., Nijssen, B., Rakovec, O., and Samaniego, L.: Towards seamless largedomain parameter estimation for hydrologic models, Water Resour. Res., 53, 8020–8040, https://doi.org/10.1002/2017WR020401, 2017. a
Nash, J. and Sutcliffe, J.: River flow forecasting through conceptual models, Part I. A conceptual models discussion of principles, J. Hydrol., 10, 282–290, 1970. a
Odry, J. and Arnaud, P.: Comparison of Flood Frequency Analysis Methods for Ungauged Catchments in France, Geosciences, 7, 88, https://doi.org/10.3390/geosciences7030088, 2017. a
Parajka, J., Merz, R., and Blöschl, G.: A comparison of regionalisation methods for catchment model parameters, Hydrol. Earth Syst. Sci., 9, 157–171, https://doi.org/10.5194/hess91572005, 2005. a
Poncelet, C.: Du bassin au paramètre : jusqu'où peuton régionaliser un modèle hydrologique conceptuel?, Theses, Université Pierre et Marie Curie – Paris VI, https://tel.archivesouvertes.fr/tel01529196 (last access: 9 May 2023), 2016. a
Samaniego, L., Kumar, R., and Attinger, S.: Multiscale parameter regionalization of a gridbased hydrologic model at the mesoscale, Water Resour. Res., 46, W05523, https://doi.org/10.1029/2008WR007327, 2010. a, b
Sebben, M. L., Werner, A. D., Liggett, J. E., Partington, D., and Simmons, C. T.: On the testing of fully integrated surface–subsurface hydrological models, Hydrol. Process., 27, 1276–1285, https://doi.org/10.1002/hyp.9630, 2013. a
Seibert, J.: Regionalisation of parameters for a conceptual rainfallrunoff model, Agr. Forest Meteorol., 98–99, 279–293, https://doi.org/10.1016/S01681923(99)001057, 1999. a
Tikhonov, A. N.: On the regularization of illposed problems, in: Doklady Akademii Nauk, Russian Academy of Sciences, vol. 153, 49–52, 1963. a, b
Zhu, C., Byrd, R. H., Lu, P., and Nocedal, J.: LBFGSB: a limited memory FORTRAN code for solving bound constrained optimization problems, EECS Department, Northwestern University, Evanston, IL, Technical Report No. NAM11, https://doi.org/10.1145/279232.279236, 1994. a