Multi-model ensemble hydrological simulation using a BP Neural Network for the upper Yalongjiang River Basin , China

Hydrological models are important and effective tools for detecting complex hydrological processes. Different models have different strengths when capturing the various aspects of hydrological processes. Relying on a single model usually leads to simulation uncertainties. Ensemble approaches, based on multi-model hydrological simulations, can improve application performance over single models. In this study, the upper Yalongjiang River Basin was selected for a case study. Three commonly used hydrological models (SWAT, VIC, and BTOPMC) were selected and used for independent simulations with the same input and initial values. Then, the BP neural network method was employed to combine the results from the three models. The results show that the accuracy of BP ensemble simulation is better than that of the single models.


Introduction
Hydrological processes are complex and are affected by many factors, including climate, weather, topography, and the sub-surface.Hydrological models provide important and effective methods of hydrological research for describing complex water cycle processes.Because of the different structure of each model, their simulation ability for a hydrological process is also different.Each hydrological model has its own characteristics and basins where it can be more effectively applied.Relying on a single model often leads to predictions that capture some phenomena at the expense of others (Duan et al., 2007).Multi-model ensemble hydrological simulation has been an effective method for improving simulation accuracy.Many researchers have applied multi-model simulation methods to improve the simulation and prediction accuracy of hydrological models (Ajami et al., 2007;Devineni et al., 2008;Najafi and Moradkhani, 2016;Razavi and Coulibaly, 2016;Zhu et al., 2016).
Artificial neural networks (ANNs), as an abstraction and simulation of some of the basic characteristics of human brain or natural neural networks, have received great atten-tion because of their advantages in solving highly nonlinear problems of great social concern.The "feed-forward, back propagation" neural network (BPNN), which is currently the most popular network architecture, can be applied in a variety of fields according to the characteristics of the model (Hu et al., 2005;Yaseen et al., 2015).Wei et al. (2013) established a predictive model, based on a wavelet-neural network hybrid modelling approach, for monthly river flow estimation and prediction (Wei et al., 2013).Wang et al. (2017) proposed a new back-propagation neural network algorithm and applied it in the semi-distributed Xinanjiang (XAJ) model.The improved hydrological model was capable of updating the flow forecasting error without losing the lead time (Wang et al., 2017).
This study explores the use of BP for hydrological streamflow predictions.We investigated how the BP scheme can be used to improve the accuracy of streamflow predictions.This paper is organized as follows.Section 2 presents the study area and data collection.Section 3 introduces the methodology, including hydrological models and the BP neural network.Section 4 describes the streamflow simulation of each ensemble member and the validation results of multi-model predictions using the BP schemes.Section 5 provides summaries and conclusions.
2 Study area and data collection

Study area
The Yalongjiang River (YLJR) is located in the Sichuan Province in southwestern China, in the eastern Tibetan Plateau, and is the largest tributary of the Jinshajiang River in the upper Yangtze River.The total length of the main stream of YLJR is 1571 km, and the YLJR Basin area is about 128 440 km 2 , accounting for 13 % of the total area of the upper reaches of the Yangtze River.The average annual runoff in the YLJR Basin is about 58 billion m 3 .The upper reaches of the YLJR Basin, above the Yajiang in situ gauging station, were selected as the study area.
The terrain elevation of the upper YLJR Basin varies greatly and from north to south ranges from 2600 to 6111 m.Because of the influence of the westerly atmospheric circulation and monsoons, the north-south climate change is also very obvious, with a dry continental climate in the northern plateau and a subtropical climate in the central and southern parts of the basin.The winters are long and cold, and the summers are cool and wet, with strong radiation all year round.The average annual precipitation during the last 50 years was about 500-2470 mm, of which 73 % occurred from June to September.The inter-annual change, and regional distribution trend, of streamflow in this basin is similar to the precipitation.

Data description
The Digital Elevation Model (DEM) dataset was taken from the Computer Network Information Centre, Chinese Academy of Sciences (http://www.gscloud.cn,last access: 17 April 2018), was used to derive the mean elevation and slope of the region at a resolution of 90 m.The land cover datasets in this study were determined using MODIS (MOD12Q1-051) data (https://lpdaac.usgs.gov,last access: 17 April 2018), and included grassland, woodland, cropland, urban and built-up areas, water, and unused land.In 2010, approximately 89 % of the entire study area was covered by grassland; woodland covered approximately 9 %, and the remainder was covered by other land cover types.The soil map for the basin was derived from the Chinese national 1 : 1 000 000-scale soil map.The physical properties of these soils were obtained from the China Soil Scientific Database (http://www.soil.csdb.cn/,last access: 17 April 2018).The other properties used as parameters for the models were computed using empirical equations (Saxton, 2006).The main soil type is plateau meadow soil.
There are nine national weather stations in the basin, shown in Table 1.Daily meteorological records from these stations for 1960 to 2013 were used to assess the performance of a range of probability distribution models (http: //data.cma.cn/, last access: 17 April 2018).Daily Streamflow data for the basin (Yajiang station) for 2007 to 2011 were obtained from the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (CAS).These data were used to calibrate the model parameters.The topography, river networks, and streamflow gauging station of the basin are presented in Fig. 1.

Hydrological models
For this study, we employed three kinds of common distributed hydrological models (SWAT, VIC, and BTOPMC) as the ensemble members of multi-model simulation.
SWAT (Soil and Water Assessment Tool) is a popular, physically-based, distributed hydrological model developed by the United States Department of Agriculture (Arnold et al., 1998)   In the present study, 12 sensitive parameters were identified for model calibration using the GLUE (Beven and Binley, 1992) method.The parameters are described and listed in Table 2. VIC (Variable Infiltration Capacity) (Liang et al., 1994) is a large-scale, semi-distributed land surface hydrological model that solves full water and energy balances, and was originally developed at the University of Washington.It is a grid-based, soil-vegetation-atmosphere transfer scheme that clearly reflects the effects of infiltration, precipitation, and the spatial variability of vegetation on water fluxes through the landscape.The model takes into account the spatial, subgrid scale, variability of infiltration, precipitation, and vegetation.The newest version of the VIC model consists of three layers that allow for explicitly depicting the dynamics of surface and groundwater interactions, and calculating the groundwater table (Liang et al., 2003).The model is driven by data on precipitation, maximum and minimum daily temperature (daily-time step), vegetation type, and soil texture.The generated runoff is routed laterally using simulated topology and stream networks to the basin outlet.The parameters and their initial prior ranges are briefly described in Table 3.
The BTOPMC (blockwise use of TOPMODEL with Muskingum-Cunge routing) model is a physically-based, distributed, hydrological model developed at the University of Yamanashi (Japan).This model extends the concepts of the original TOPMODEL to distributed hydrological simulation in large river basins (Takeuchi et al., 2008).In the BTOPMC model, the hydrological processes are considered in grid cells to represent spatial heterogeneity.The grid cells are then divided into several sub-catchments or sub-basins, where each sub-basin is treated as a block or a unit to limit the total number of parameters.The vertical column is divided into four zones: vegetation, root, unsaturated, and saturated.In the BTOPMC model, there are six basic model parameters to be identified.A detailed description of these parameters is given in

BP neural network
The BPNN is a multilayer structure and feed forward mapping model trained by an error back propagation algorithm.
The topology of the BPNN model mainly comprises the input, hidden, and output layers.The BP algorithm consists of forward propagation and error back propagation.In the forward propagation, the input information is processed from the input layer to the output layer through the hidden layer.The state of each layer is only affected by the state of the next layer.If the expected output is not obtained in the output layer, then it is transferred back to the error back propagation.The error signal is returned along the original connection path.The error is minimized by modifying the weights of each layer node.The topology of the BPNN model is shown in Fig. 2.
In the present study, the inputs of the BP network are the simulation results of three hydrological models, and the output is the streamflow at the Yajiang station.

Model performance indicators
Before the model can be applied for evaluating the method proposed in this study, the model robustness in the basin must be examined by assessing the model's performance in terms of benchmark calibrations.The Nash-Sutcliffe efficiency coefficient was selected to evaluate the simulation performance of the three individual models as well as multi-model ensemble simulation: where Q obs,i (m 3 s −1 ) and Q sim,i (m 3 s −1 ) represent the observed and simulated streamflow, respectively, at time step i, and Q obs,avg (m 3 s −1 ) is the average value of the streamflow observations.The integer n is the number of samples.E NS measures the agreement between modeled and observed values, with E NS = 1.0 indicating a perfect agreement between modeled and observed streamflow for a given basin.

Streamflow simulation of each ensemble member
Using available streamflow records, the benchmark calibration for the basin was conducted using continuous daily streamflow observations for 1 January 2007-30 April 2010, and was validated using data for 1 May 2010-31 December 2011.The results of the benchmark calibrations of the three ensemble members are listed in Table 5, and the streamflow simulations are shown in Figs.3-5.
During the calibration period, each E NS coefficient of the three ensemble members was above 0.75.In the validation period, the E NS coefficient for the BTOPMC model reached 0.75, which was slightly better than the SWAT and VIC models.The results of these calibrations were taken as the benchmarks for the basin.Figures 3, 4, and 5 compare the observed and simulated daily streamflow series derived from the three ensemble members in the calibration and validation periods.It can be concluded from these figures that the simulations of daily streamflow series from three models are acceptable on the whole.However, the representations of some low flows are not as satisfactory as those of normal flows, with a noticeable underestimation of much of the low flows.It was also difficult to accurately obtain the peak flow and peak time using the three models.One of the reasons why the peak flow was difficult to simulate accurately was because the daily scale model usually converted short-term rainstorms (a few hours or even a few minutes) to long duration (1 day) events.This smoothing obviously has an effect on the simulation results for high flow volumes.
Considered as a whole, the simulated streamflow values from the three models are in good agreement with the measured data, which shows that these models can be used in the basin.

Streamflow simulation using the BPNN multi-model ensemble
The performance of the multi-model ensemble simulations in both the calibration and validation periods are shown in Fig. 6.From the simulations, it can be seen that the BPNN multi-model ensemble greatly improves the results of a single member.The BPNN multi-model ensemble adequately reproduces the observed daily streamflow series, with E NS values of 0.95 and 0.90 during the calibration and validation periods, respectively.The E NS of the simulation in the ensemble model is satisfactory, and it reproduces the observed hydrographs well.The fitting of the flow process in the BPNN multi-model ensemble is clearly better than that of a single member, especially for the relatively low flow period (from November to April).Moreover, the simulation accuracy of the ensemble model for peak flows is also much higher than that of a single model.

Conclusions
This study explored the feasibility of applying a multi-model ensemble simulation to the upper YLJR Basin.The SWAT, VIC, and BTOPMC models have good simulation results as a whole, but there are some errors for low flows and peak flows.The results of the BPNN multi-model ensemble simulation are better than that of a single model.Data availability.The raw data required reproduce these findings are available to download through URLs provided in Sect.2.2.The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.
Competing interests.The authors declare that they have no conflict of interest.
Special issue statement.This article is part of the special issue "Innovative water resources management -understanding and balancing interactions between humankind and nature".It is a result of the 8th International Water Resources Management Conference of ICWRS, Beijing, China, 13-15 June 2018.

Figure 1 .
Figure 1.Topography, river networks, and streamflow gauging station of the basin.
sediment and pollutants at the basin scale.The hydrological processes considered in the model include precipitation, interception, infiltration, evapotranspiration, snowmelt, surface runoff, percolation, baseflow, and flow movement in river channels.The model divides a basin into multiple sub-basins that are further subdivided into hydrological response units (HRUs), which have homogeneous land use and soil characteristics.Three steps are involved in simulating the hydrological processes: (1) preparing the meteorological input and GIS data of the soil type and land cover, (2) constructing the model, and (3) the calibration/validation process.

Figure 2 .
Figure 2. The topology of the BPNN model.

Figure 3 .
Figure 3. SWAT simulated streamflow for the basin in both the calibration and validation periods.

Figure 4 .
Figure 4. VIC simulated streamflow for the basin in both the calibration and validation periods.

Figure 5 .
Figure 5. BTOPMC simulated streamflow for the basin in both the calibration and validation periods.

Figure 6 .
Figure 6.BP ensemble simulated streamflow for the basin in both the calibration and validation periods.

Table 1 .
Weather stations in the basin.

Table 3 .
Parameter values for the VIC model.

Table 5 .
Comparison between ensemble simulation and ensemble members' optimal simulating series.