Estimating parameter values of a socio-hydrological flood model

Socio-hydrological modelling studies that have been published so far show that dynamic coupled human-flood models are a promising tool to represent the phenomena and the feedbacks in human-flood systems. So far these models are mostly generic and have not been developed and calibrated to represent specific case studies. We believe that applying and calibrating these type of models to real world case studies can help us to further develop our understanding about the phenomena that occur in these systems. In this paper we propose a method to estimate the parameter values of a socio-hydrological model and we test it by applying it to an artificial case study. We postulate a model that describes the feedbacks between floods, awareness and preparedness. After simulating hypothetical time series with a given combination of parameters, we sample few data points for our variables and try to estimate the parameters given these data points using Bayesian Inference. The results show that, if we are able to collect data for our case study, we would, in theory, be able to estimate the parameter values for our socio-hydrological flood model.


Introduction
Socio-hydrology aims to study the long term feedbacks between humans and hydrology and tries to explain the phenomena that occur as a result of these feedbacks (Sivapalan and Blöschl, 2015;Sivapalan et al., 2012).So far, sociohydrological flood models are mostly based on concepts derived from personal experience/observation of how the system works (Di Baldassarre et al., 2013, 2015, 2017;Grames et al., 2016, Grelot andBarreteau, 2012;Viglione et al., 2014).Most of them are general models that are not developed to reproduce and evaluate the dynamics of a specific case study and have not been compared to data.An exception to this is the work of Ciullo et al. (2017): they made a qualitative comparison between the model results of Di Baldassarre et al. (2015) and the human-flood system of the city of Rome and Bangladesh using data on population density, flood losses and levee heights.Chen et al. (2016) developed a socio-hydrological model to reflect the human-flood dynamics in the Kissimmee River Basin, Florida.Using flood series and data about wetland area they made a qualitative assessment of the model's performance.While the sociohydrological models and studies so far show that these models can yield valuable insights into human-flood dynamics, there is a lack of application to real world systems and none of the models so far have been calibrated to represent a specific case study.It is therefore of interest to assess whether calibrating socio-hydrological models is indeed feasible and understanding what are the data needed and their amount.
Here we develop a socio-hydrological model and explore whether we are able to estimate the parameter values of this model using Bayesian Inference (Gelman et al., 2014) if we would have data available.

Model structure
History has shown that repeated flood events may result in lower damages if flood events happen again, because peo-0.0 0.2 0.4 0.6 0.8 time Discharge q q q q q q q 0.0 0.2 0.4 0.6 0.8

Year[tRobs]
Loss (R*D) q q q q 0.0 0.4 0.8 time Awareness q q q q q q 0.0 ple become aware of the flood risk and adapt.We want our model to capture these interactions between floods, awareness and preparedness.We model these interactions with the following variables: Floods (W ), Losses (L), Relative Losses (R), Settlement Density (D), Awareness (A) and Preparedness (P ).The behavior of these variables over time is described with the system of differential equations in Eq. ( 1).
High water levels (W ) (here called floods) and the protection level (H ) are external variables.For illustrative purposes, we use time series of the gauge at Dresden, Germany.If the water level (W ) is higher than the protection level (H ), this results in a relative loss (R) as given in Eq. (1b).R increases exponentially to a maximum of one for a maximum flood.
Together with the settlement density (D), this results in an actual loss (L), according to Eq. (1a).The actual loss is higher if the relative loss is higher or if the settlement density (and thus the exposure) is higher.
According to the literature, awareness (Eq.1d) increases when households experience damage (Bradford et al., 2012;Bubeck et al., 2013;Osberghaus, 2015;Owusu et al., 2015;Poussin et al., 2014;Wachinger et al., 2013), and decreases exponentially over time with rate µ A as households forget (ICPR, 2002;Kreibich et al., 2011).We assume the increase in awareness depends on the size of the awareness as well, i.e. if awareness is high already it will increase less than when it is close to zero.
An increase in awareness may result in an increase in the uptake of precautionary measures and thus preparedness.How big this increase in preparedness is depends on other factors as well, like coping appraisal (Bubeck et al., 2012), maladaptive coping responses (Bubeck et al., 2013) or worry (Miceli et al., 2008;Raaijmakers et al., 2008).The fact that not all households that are aware of the risk take precautionary measures is represented by the parameter α P , which determines how much of an increase in awareness results in an uptake of precautionary measures and thus an increase in preparedness.As described by Eq. (1e), preparedness increases after a flood event with an amount relative to the change in awareness and depending on how high the preparedness was before the flood event (if households have already implemented many measures, there is less room for an increase in preparedness than when they have not implemented any measures).Preparedness only increases when damage occurs, i.e. when the relative loss is higher than zero.We approximate this step function with a tangens hyperbolicus (Eq.1f).Like the awareness, preparedness decreases exponentially over time with a rate µ P because households forget and measures deteriorate.
According to the literature, implementing precautionary measures can greatly reduce the losses because of a flood (Kreibich et al., 2015;Poussin et al., 2015).So if preparedness is higher the relative losses will be lower.How much lower the relative losses will be depends on the parameter α R (Eq. 1b).As explained the losses depend on the relative losses and on the settlement density.The settlement density grows with a general growth rate U , but if awareness is higher this will reduce the growth rate and thus settlement density will grow slower or may even decline as described in Eq. (1c).

Bayesian inference
We want to estimate the values of the model parameters from data.The variables are scaled between zero and one.We assume β R is one and try to estimate the values of the six parameters that do not have a value of one.These parameters are described in Table 1.In addition we need to estimate the starting values of the variables settlement density, awareness and preparedness, because we describe the evolution of these variables with a differential equation.Since socio-hydrology studies both the human and the hydrological system, we have to deal with various types of data with different uncertainties.Bayesian inference allows us to include all of these different types of data.Bayes' Theorem (Eq.2) tells us that the posterior distribution of our parameters depends on the likelihood of the data and our prior estimation of the parameter distributions:    2 for their values).certainties.In most cases it is not possible to compute the integral in the denominator, therefore we approximate this using Markov chain Monte-Carlo (MCMC) methods (Gelman et al., 2014).In this case we use the software Stan (Carpenter et al., 2017) to perform the estimation.Stan uses Hamiltonian Monte Carlo sampling, which uses the gradient of the log probability to speed up convergence and parameter exploration (Stan Development Team, 2017).

Results
We assume that the parameters have the values given in Table 2 and simulate time series of all state variables (blue lines in Fig. 1).The question we ask is: if we could observe, with uncertainty, the variables only in a limited number of time points, would we be able to infer the temporal evolution of the variables or, which is analogous, would we be able to es-timate the parameters of the model?To test whether this is possible, we sample some data points for each of the variables: Settlement Density (D), Loss (L = R • D), Awareness (A) and Preparedness (P ).These data points and their uncertainties are plotted in black in Fig. 1.Using these data points and uninformative priors for our parameters, we perform the Bayesian Inference using MCMC.Figure 1 shows the mean (solid red line) and 90 % credible bounds (dashed red lines) of the time series as estimated with the inference.
The top graph gives normalized discharges and protection level time series at the annual time scales.If the discharge is higher than the protection level this results in a loss as shown in the third graph.The size of this loss depends on the relative loss and the settlement density.If losses occur, this causes an increase in awareness (second graph from the bottom) and subsequently an increase in preparedness (bottom graph).Both the awareness and preparedness decrease grad-Proc.IAHS, 379, 193-198, 2018 proc-iahs.net/379/193/2018/ually over time when no flood occurs.The awareness does slightly influence the growth of the settlement density since every time a flood occurs the settlement density grows more slowly in the following years.
The credible bounds of the simulation are quite narrow, which means that the posterior distributions of our parameter values are narrow as well and that we are confident that we almost got the parameter values right.Figure 2 shows the real parameter values (used to generate our data points) in blue, the prior distributions that we used for the Bayesian inference in black and the posterior distributions estimated with the inference in red.The posterior mean estimates are very close to the real parameter values (see also Table 2).The estimates of the initial values are bit further off from the real values, which is not surprising given that we did not pick any data points at the start of our time series and therefore we do not have any information available to estimate the variables at the start.We used flat uninformative prior distributions and we are able to determine the posterior distributions quite well.The posterior distributions are not much influenced by the prior distributions and mainly depend on the data.

Conclusions
We have conducted an artificial experiment where a Socio-Hydrological model is assumed whose "true parameters" are known to us, since we impose them, and observations are sampled from variables generated with the model.The aim of this paper is to assess whether the model can be calibrated to these observed data.This is not trivial because the model is highly nonlinear and it is not clear what amount of data is needed for calibration and, more importantly, whether the amount we can imagine to find in a real world case study can be enough.We demonstrate that, if the assumptions underlying the model are valid, we are able to estimate quite accurately the parameter values from relatively few data, which could be available in real case studies, using Bayesian Inference.Other estimation techniques, such as those used in rainfall-runoff modelling, could be used instead but the fact that Bayesian Inference is a flexible tool that can incorporate different types of information makes it ideal for the application to socio-hydrological models, where data are highly uncertain and prior information may be available to constrain model parameters.
The next step will be to gather data for a specific real world case study, develop and apply a model to it.This will pose a bigger challenge than it is in the case presented here, since the model will not represent correctly and fully the real world.Also, another challenge will be to map the information gathered on variables such as awareness and preparedness to the zero-to-one space adopted here.However it is promising to know that, based on what we show in this paper, if we can rely on the assumptions of the model so that the uncertainty is mainly due to not knowing its parameters, we can actually calibrate a socio-hydrological flood model to the few data available.
Data availability.The analyses in the paper are based on generated data, with the exception of the time series of water levels of the river Elbe at Dresden, which is the result of the research project "Integration von historischen und hydraulisch/hydrologischen Analysen zur Verbesserung der regionalen Gefährdungsabschätzung und zur Erhöhung des Hochwasserbewusstseins 2005-2007" (BMBF-Projekt, 2007) (Uwe Grünewald and Sabine Schümberg, Hydrology and Water Resource Management, Brandenburg University of Technology BTU Cottbus, Germany).
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Innovative water resources management -understanding and balancing interactions between humankind and nature".It is a result of the 8th International Water Resources Management Conference of ICWRS, Beijing, China, 13-15 June 2018.

Figure 2 .
Figure2.Prior (black) and posterior (red) distributions of the parameters.The "real" parameters to be estimated are represented by the blue vertical lines (see Table2for their values).
Estimated time series are plotted in red, with the mean as a solid line and the 90 % credible bounds as dotted lines.
YearFigure1.Generated dynamics of system variables, sampled data points and estimated time series.The generated dynamics of the system variables is represented in blue.Sampled data points are shown in black, with their 90 % uncertainty bounds.

Table 1 .
Model parameter description and units (n h = number of households, n m = number of precautionary measures)

Table 2 .
Parameter values.Real values used to generate our data points and the mean and standard deviation of the posterior distribution estimated with Bayesian inference