Predicting land deformation by integrating InSAR data and cone penetration testing through machine learning techniques

. Built environments developed on compressible soils are susceptible to land deformation. The spatio-temporal monitoring and analysis of these deformations are necessary for sustainable development of cities. Techniques such as Interferometric Synthetic Aperture Radar (InSAR) or predictions based on soil mechanics using in situ characterization, such as Cone Penetration Testing (CPT) can be used for assessing such land deformations. Despite the combined advantages of these two methods, the relationship between them has not yet been investigated. Therefore, the major objective of this study is to reconcile InSAR measurements and CPT measurements using machine learning techniques in an attempt to better predict land deformation.


Introduction
Built environments developed on unconsolidated and/or organic sediments are susceptible to land deformation due to the weight of buildings, roads and fluctuation of ground water level (Kempfert and Gebreselassie, 2006;Peduto et al., 2016). Hence, the spatial and temporal monitoring and analysis of ground deformation is necessary for the sustainable development of cities.
More specifically, in case of roads, the deformation causes failure in serviceability and performance of the infrastructure and induces high maintenance and repair costs (Peduto et al., 2016;Du et al., 2018). Unevenly deformed roads are dangerous, damaging and inconvenient for both the vehicles and passengers (Wijeyesekera et al., 2016). Furthermore, the partial closures of the transportation networks during maintenance periods have adverse socioeconomic impacts. For these reasons, predicting and continuous monitoring of the ground deformation along infrastructure networks is of significant importance for improving the network resilience (Peduto et al., 2016;North et al., 2017).
For monitoring the rate of land deformation, advanced Interferometric Synthetic Aperture Radar techniques such as Differential InSAR (DInSAR) can be used. SAR data currently has sufficient temporal resolution and by applying DInSAR techniques, land deformation can be monitored on the order of millimeters (SkyGeo, 2018). However, there are always gaps in the final deformation results due to occlusion and coherence loss in SAR imagery.
The potential of Cone Penetration Testing (CPT) for estimating land deformation has been extensively studied in the Geotechnical Engineering community (Koster et al., 2018a, b;Verruijt and Van Baars, 2007;Kempfert and Gebreselassie, 2006). The CPT measurements provide quantitative information about the characteristics of the soil layers including the compressibility. However, CPT-based methods in estimating land deformation suffer primarily from empiricism and spatial-temporal discontinuity.
Despite the application of these two methods in estimating ground deformation, the direct relationship between the CPT measurements and the rate of deformation acquired from DInSAR has not yet been investigated. Therefore, the major objective of this study is to reconcile DInSAR measurements and CT measurements using Machine Learning (ML) techniques to better predict land deformation. In Sect. 2, the proposed methodology for solving this problem is explained. In Sect. 3, the proposed methodology is applied on an example case study of a road in the Netherlands. Finally, in Sect. 4, the final conclusions are stated and followed by the acknowledgements.

Methodology
The overall methodology consists of four main steps -see Fig. 1 (Sajadian, 2019). The first two steps are mainly concerned with gathering and pre-processing of the datasets, in which the relevant parameters for the next steps are extracted. In the third step, the correlations and similarities are investigated. In the fourth step, we use Machine Learning to define the relationship between soil properties, loading/unloading history and the linear rate of deformation.

Steps 1 and 2: Data Gathering and Extracting Parameters
The relevant parameters from CPT measurements such as depth, cone resistance q c , sleeve friction f s , friction ratio R f , etc. and soil types are extracted. These parameters are considered as soil properties in this research. The Z coordinate of the CPT indicates the elevation of the terrain before construction of the road. The current elevation of the road is derived from the Digital Elevation Model (DEM), which is extracted from the LiDAR point cloud of the highway. Assuming a uniform thickness of 90 cm for the surface, base, subbase and sub-grade of the road (based on the road construction standards), the difference between the current (DEM) and old elevation (CPT) indicates the amount of removed or added stress due to excavation or backfilling. The SAR images are processed by combining a sequence of radar images (Terrasar-X with the spatial resolution of 3.00 m × 2.80 m and revisit period of 11 d) from 2016 till 2019 to measure the ground deformation using D-InSAR techniques. The main product is the time series representing the amount of deformation with respect to the first acquisition. Each of the time series is decomposed to a linear trend over the 3-year period and a seasonal pattern using a least squares linear regression model. For each CPT measurement, the nearest InSAR measurement (within a distance of less than 5 m) is extracted as the deformation time series corresponding to that CPT point.

Step 3: Correlations and Similarities between Soil Properties, Loading/Unloading and Deformation
In this step, the similarities and correlations between soil properties, loading/unloading history and the resulting deformation are being studied. Serra and Arcos (2014) presents number of similarity measures for clustering and classification of time series. The q c and f s profiles are series of measurements in depth and can be treated as time series. Hence, we can use the similarity measures discussed in Serra and Arcos (2014) to measure the similarities of q c and f s profiles on the road. In this research, the hypothesis is that if two CPT measurements are similar in terms of both q c and f s profiles and the loading history is the same, the deformation behavior should be the same. Here, we used the simplest similarity measure, i.e. the Euclidean (Serra and Arcos, 2014) distance between the time series, which is computationally efficient and suitable for comparison of samples that are at exactly the same depth location. By our definition, two CPT are considered similar if the sum of normalized distances of their q c and f s is less than 0.2 and the difference between their loading/unloading stress is less than 10 kPa (these thresholds are based on expert's knowledge and trial and error). If the aforementioned hypothesis is correct, the deformation rate of a reference point in the dataset should be more or less the same as the mean of the linear rates of deformation of the similar points (with similar CPT profiles). The coefficient of determination between the deformation rate of the reference point and the similar points is regarded as a measure that describes the degree that the deformation rate can be taken as a function of soil properties (CPT measurements) and loading/unloading stress.

Step 4: Feature Extraction and Modeling Using Machine Learning
In this step, first we extract quantitative descriptors from CPT profiles. (Coerts, 1996) lists the possible quantitative features and their interpretation and shortcomings for CPT segments. Ultimately, he introduces a set of the most suitable and interpretable descriptors for CPT measurements, which we use in our case study: the Interquantile range (IQR), Indicator of simple trend (T), Indicator of convexity or concavity (C), Normalized number of fluctuations around the median (R) and Sharpness of upper boundary (B). The quantitative features are extracted from the CPT profiles to the depth of 15 m under the ground surface. The choice of 15 m is due to the good trade-off between having the maximum possible depth and not losing too many CPT measurements shallower than that depth, as well as the fact that peat and clay layers are mostly present above this depth. The loading/unloading stress is another feature. The goal is to establish the relationship between the these features and the linear rate of deformation, which is predicted. In the research work of Sajadian (2019) a qualitative as well as a qualitative Machine Learning (ML) are shown and compared. There are multiple ML algorithms one could use (Breiman, 2001). However, there are no previous studies for our case study, so the choice of the proper set of features is unknown. The ML algorithm should provide information about the significance of each of the features and the established model through the ML algorithm should be interpretable. Taking this into account we selected tree-based algorithms, which satisfy all these conditions, which are Gradient-Boosting and Random Forest (Hastie et al., 2005).

Case Study and Results
The required data for modeling land deformation significantly depends on the case study. In this research, the newly constructed part of A4 highway connecting Delft to Schiedam (the Netherlands) is studied (https://www. wegenwiki.nl/A4_(Nederland), last access: October 2019). For the sake of brevity, we will only present the results of the last part of step 4 of the methodology. For the complete study readers are referred to Sajadian (2019).
The soil properties and loading/unloading history influence the rate of deformation in different directions. The complicated interactions between the driving mechanisms suggests that the relationship is definitely not linear to be recognized by the simple correlations. This led us to using ML to model the relationship between these datasets. The location under study is about 5 km long and and has 368 CPT's for which deformation measurement points are available. Figure 2 shows the quantitative features extracted, at different depths, for one CPT. These features are the descriptors of the segments of every 5 m. In this research, rather than being interested in importance of each of these descriptors in estimating the target value, we are more interested in investigating which of the profiles and which depth of measurement is more significant in estimating the linear rate of deformation.
For the A4 highway case study, and using the features mentioned above, the two ML algorithms are tested on the dataset: Gradient-Boosting and Random Forest. Eighty percent of the measurements are taken for training each of the algorithms and 20 % of the measurements are used to validate the results. Table 1 summarizes the performance metrics for each method.
As shown in Table 1, the outcomes of the two ML algorithms are very similar and the histograms show a very similar distribution. The estimated deformation rates are between −1 and 4 mm yr −1 , meanwhile the measured values are between −7 and 8 mm yr −1 -see Fig. 3. This means that both algorithms fail to detect subsiding patterns and extreme heaving patterns. The errors (mostly between −3 and 3 mm yr −1 ) could be explained by the fact that the data set is imbalanced (15 % subsiding and 85 % heaving), the total number of data points is not enough and/or the features selected are not representative of the observed deformation.
Nevertheless, both ML algorithms give consistent results in terms of generalization performance and feature importance with negligible differences. When we look at the significance of each feature on predicting the target deformation rate, both algorithms showed that features extracted from CPT's q c are the most dominant ones. After this, the loading/unloading stresses are also important.

Conclusions
In this research, the main focus was studying and modeling the deformation on roads due to loading/unloading and based on soil characterization using ML algorithms. The desired output of the research was an ML model trained by standard data that enabled the prediction of surface movements of roads susceptible to soil deformation. The case study was the newly constructed part of A4 highway (Delft-Schiedam) in the Netherlands.
It was concluded, for this case study, that: the available data sources on soil data do not provide all the necessary information, e.g. information on presence of certain expansive minerals or information about ground water conditions in the soil are missing. Here, only the latest (simplified) loading/unloading step was estimated while information on the previous stress conditions (which were discontinuous in time) were missing; the InSAR measurements only provided information about the first three years after the construction of the road and, therefore, the information on the amount of deformation was limited to this time span; and finally, although this case study showed diverse deformation behavior (both heave and subsidence), which made it interesting for investigating more influential parameters of road deformation, it presented another limitation. i.e. the diversity of the behavior in this case study proc-iahs.net/382/525/2020/ Proc. IAHS, 382, 525-529, 2020   was due to the special construction history and therefore it could not be easily generalized to the other roads.
In this study, the trained ML algorithm, rather than presenting a general relationship between InSAR data and CPT data, helped to investigate the effectiveness of the gathered data in explaining the studied phenomena. The results this research study could be improved by adding more data points and features as well as more accurate boundary conditions. This could help to explain the diverse deformation pattern of A4 highway. Data availability. The data that support the findings of this study are available on request from the corresponding author.
Author contributions. MS designed the computational framework and analysed the data. AT, FST and ML contributed to the design, implementation and supervision of the research. All authors contributed to the writing of the manuscript.
Competing interests. The authors declare that they have no conflict of interest.

Special issue statement.
This article is part of the special issue "TISOLS: the Tenth International Symposium On Land Subsidence -living with subsidence". It is a result of the Tenth International Symposium on Land Subsidence, Delft, the Netherlands, 17-21 May 2021.