Landslide Inventory and Landslide Susceptibility Mapping for China Pakistan Economic Corridor (CPEC)’s main route (Karakorum Highway)

-- The landslide inventory helps to develop landslide susceptibility maps which further assist to minimize economic and human losses as well as towards hazard management. This work investigates landslide hazard mapping in the rugged mountain terrain vis-a-vis highly economically significant route between China and Pakistan i.e. Karakuram Highway (KKH). KKH is passing through the Karakorum mountainous region where landslides events occur frequently and prone to serious risk to local travelers, tourists, and to trading caravans. In this work, landslide inventory was developed (302 landslides) along KKH by visual interpretation of Sentinel and google images. Field survey was also carried to validate landslide datasets. Traditional knowledge-based model i.e. Analytic Hierarchy Process (AHP) and data-based models that include Frequency Ratio (FR) and weight of evidence were applied and compared to develop landslide susceptibility maps (LSM). The landslide dataset was divided into modelling/training (70%) and testing/validation (30%) datasets. LSMs are validated by Area Under Curve (AUC) criterion. The results show that weight of evidence, AHP and FR have success rate curves of 61%, 72% and 84%, respectively. In addition, most highly accurate models are validated for their prediction power using testing landslide datasets. The results for prediction capacity for weight of evidence, AHP and FR are 72%, 58%, and 64%, respectively. Further, landslide susceptibility index (LSI) maps are classified into susceptibility zones. The validation and prediction results show that FR model is the most reliable and accurate model for our study area. Our results will be helpful to minimize landslide hazard losses along KKH, ultimately assisting in successful implementation of CPEC idea between China and Pakistan.


I. INTRODUCTION
Landslides are destructive natural hazards that frequently lead to loss of human life and economic resources, as well as triggering severe damage to natural resources. Sometimes, they affect overall economic system of many nations around the globe [1] . It is projected that, internationally, around one thousand persons loss their lives and financial damages of about 4.1 billion US$ occurs because of landsliding events each year [2]. Landslides demonstrate themselves in many different forms including debris flows, rock falls, rockslides, rock avalanches, soil slips, and mud-flows. Landslides are considered as third in the list of natural hazards as they pose high risk and other adverse economic effects worldwide. The recovery from these disasters is sometime even higher than resources of the country. To avoid such situations, we can use remote sensing and GIS techniques to develop landslides inventory and LSMs of disaster-prone areas and plan resources and manage disaster in an efficient way. Further, disaster risk reduction plans and policies can be formulated producing updated and accurate landslide susceptibility maps. These susceptibility maps will lead to minimize risk to vulnerable people and avoid extensive economic loss [3].
To develop Landslide Susceptibility mapping, the first step is to make a landslides inventory where Landslide Susceptibility Indexing (LSI) represents the probability of a landslide occurring in an area based on local influencing factors and terrain situations. The mapping of any area is possible because of accessibility and availability of different type of remote sensing, geographical datasets (which includes different type of thematic layers and causative factors, e.g., topography, etc.). Thus, remote sensing and GIS can play an important role in the development of a landslide inventory map and thematic maps related to landslide occurrence. Landslides inventory are developed using different approaches that include visual interpretation of satellite data, aerial photogrammetry, and supervised and un-supervised classification. Several previous research studies have used high resolution satellite imagery to generate landslides inventory data [4,5] using different approaches including machine learning algorithms [6]. Usage of high-resolution satellite imagery to develop landslide inventory approach remains successful. Further, to combine different models to identify potential landslide areas through [19] developing LSM contributed to manage risk in a sustainable way [7,8]. More recently Artificial Intelligence techniques have been used to identify potential landslides [9][10][11][12][13]. Similarly, combining different approaches that include interferometric synthetic aperture radar (InSAR) images for monitoring and assessment of landslides have led to high accuracy in identification of potential landslides [14][15][16][17][18].
Based on inventory data with the support of causative factors using different models that include expert-based and data driven-based models have resulted in successful development of LSM which are useful for end-users [19][20][21][22]. Several previous studies applied expert knowledge-based models to develop LSM including the Analytical Hierarchy Process (AHP), however, the resulting maps contains a degree of uncertainty. Therefore, several approaches have been used to improve AHP's accuracy [23,24]. On the other hand, data driven models show consistency in better results in the LSM [25,26]. Frequency Ratio (FR) model for LSM is based on causal factors which is used by several researchers to develop LSMs [19]. Similar, statistical models include fuzzy logic [27], index of entropy [28] and logistic regression model [29] are in use to generate LSMs.
The Karakoram, Himalaya, and Hindukush mountain ranges, that meet in Gilgit-Baltistan province of Pakistan are frequent to landslides due to the rough terrain, seismic activity, extreme weather conditions and infrastructure constructions on unsteady slopes. The Karakorum Highway (KKH) (1300 km long) which is one of the highest paved roads in the world connects Pakistani province of Gilgit-Baltistan with China's Xinjiang Uyghur Autonomous Region passes through Karakoram range. The KKH has high economic significance because all economic activity related to the import and export of Gilgit-Baltistan province is dependent on this highway. In addition, there is a large portion of import and export of goods to/from China is done through this highway. The economic importance of this highway has risen in recent years because of China-Pakistan Economic Corridor (CPEC) project (CPEC is a collection of infrastructure projects, the value of CPEC projects is worth $62 billion as of 2017). The highway is frequently blocked due to landslides, for example, recently in 2010, 19 Km of highway was buried, 20 people died and around 350 households were destroyed, and this landslide blocked the Hunza river forming a large lake namely Attaabad Lake. In this context, considering high significance of this highway, we selected a portion of this highway (300 km) and develop a landslide inventory. In this work, a landslide susceptibility maps are developed using frequency ratio, weight of evidence and AHP modelling techniques. These landslide prediction models are compared for their prediction accuracy using landslide inventory. This work will be helpful to avoid and minimize economic and human losses along this very important travel route.
The rest of the paper is organized starting with study area, followed by methods and materials, and results. Then a discussion section to discuss the comparison of methods and their results on KKH area. At the end a conclusion section for describing concluding remarks of this research work.

A. STUDY AREA
This study focuses on a portion of KKH as the total length of KKH is 1300 km, located in Gilgit-Baltistan province of Pakistan and Xinjiang province of China ( Figure 1). The study is conducted in districts Gilgit, Hunza, and Nagar, Gilgit-Baltistan province in the north of Pakistan. The study area consists of series of villages through which KKH is passing, starting from Juglot which lies geographically between latitude from 35ᵒ76′977″N and longitudes from 74ᵒ57′402″E and runs through Jutal, Rahimbad, Aliabad, and so on, ends at Khunjarab top, China-Pakistan border pass. The area is located along the left and right banks of the Hunza, Gilgit and Indus River. This study area covers 332 km long covering 10 km radius buffer of area along KKH. Therefore, the study focused area is about 3320 km2. The area is prone to natural hazards as snow avalanches, landslides, and earthquakes are frequently hitting the area. The most common types of landslides in the study area are rock fall and debris falls triggered by rain falls and tectonic activities ( Figure 2) [30]. Most of the rocks' types are Paleozoic, Proterozoic, and Mesozoic. In a year, the average rainfall in Gilgit is 154 mm. Water irrigation for land cultivation is obtained from the streams and rivers, abundant with melting snow and glaciers water from high mountains. The summer season is longer, dry, and hot.

[20]
Strong sunshine rarely raises temperatures to 40 °C (104 °F) whereas in winters the average temperature remains less than 10 °C. Due to this extreme weather conditions, landslides and avalanches are common in the area. The geological structures and soils are weak in the region that also play crucial role. Additionally, the mountains have steep slopes that are susceptible to landslides [34].

A. Datasets
The landslides inventory is developed by visual interpretation of Sentinel 2 images, which were counter verified in Google earth maps and field data and boundaries were adjusted accordingly. The main scarp of every recorded landslide during the field work was illustrate in topographic maps at a proper scale and then was digitized as polygon layer [22]. In the study area, 302 landslides were mapped for the inventory. For each landslide in the inventory, it contains information such as location, size, and direction of the landslide, the bedrock, and surface material. The inventory was split into training of development of Landslides Susceptibility Mapping (80%) and validation (20%) sets. The details of datasets used are given in Table 1.

B. Methodology
The workflow of methodology used for this study is given in Figure 3. Each step of methodology is explained in the following paragraphs.

C. Landslide Controlling Factors
The eight factors which causes the occurrence of landslides in our study area are categorized into topological, hydrological, geological, and anthropologic factors ( Figure 4). Topological factors include slope and aspect. The slope angle is considered the main parameter of the slope stability. Aspect related parameters such as exposure to sunlight, winds, rainfall, soil moisture and breaks may control the incidence of landslides. Terrain slope, and aspect are computed from the SRTM DEM of 30 m resolution. To assess the impact of slope angle on landslide dispensation, the slope angle map was categorized into five classes following the source [35], the terrain aspect was classified into 9 classes.
Geological factors considered for this study include geology and proximity to faults. Geology plays a vital role in landslide susceptibility studies since diverse geological units have diverse susceptibilities to activate geomorphological processes. Fault lines and eight geological formations (classes) were digitized using the geological map of Pakistan shown in Table  2. This region is encompassed with eight geological divisions, such as Late Paleozoic Rocks, Igneous and metamorphic rocks, Precambrian metamorphic and sedimentary rocks, Alluvium, Paleozoic Rock, Early Mesozoic & Late Paleozoic rocks, Permian Rocks, and Unmapped.
Precipitation and proximity to streams are hydrological factors considered for this study. Precipitation and drainage play very important role in respect of landslides as most of the time rain and erosion of water trigger landslides in this study area.
The anthropological factors considered in this study area are land use and distance to roads. A land cover map was created using Sentinel 2 image based on supervised classification. To assess the influence of the land cover on landslides movement, the land cover of our area of interest was categorized in eight different classes (shown in Table 2). The accuracy of the land

[21]
cover map was 87% obtained through confusion matrix of a LULC classification as it was validated through field data. Land use is one of the main factors influencing for the incidence of landslides, on the one hand, the barren slopes are more expose to landslides. On the other hand, vegetative areas contribute to reduce occurrence of landslides. One of the influencing factors for the stability of slopes is road development and construction activity. This factor map was developed as per the hypothesis that landslides may be more common along roads, due to unsuitable cut slopes and drainage from the road. The road network was developed through digitalization from the Sentinel 2 image [35].

D. Analytical Hierarchy Process (AHP)
AHP is a tool used for site assessment, planning, and vulnerability analysis [36]. It needs the participation of experts based on importance of each factor consider for the multicriteria and multi objectives. Each factor is weighted and combined to form an integrated weight. The factors' weights are generated based on pairwise comparison matrix of AHP based on expert knowledge. The principle of transitivity is considered important in AHP for any specified three factors (such as m1, m2 and m3), and is explained as follows; if m1 > m2 and m2 > m3, then m1 > m3. This principle of transitivity makes a foundation for conditioning factors weighing in AHP. It is important to calculate the consistency of expert comparisons in matrices in each step. The consistency ratio (CR) is determined by Eq. (1).

[23]
CR = (_max -n)/(RI(n -1)) RI is the random index of a pairwise comparison matrix for n = 2, 3, 4, 5, 6, 7, 8, and 9. If the CR is <0.10 it considers an suitable level of consistency, whereas a CR > 0.10 points to a degree of inconsistency [37]. In this study total 15 experts were served with questionnaires to collect expert knowledge for layer weighting. The evaluated weights of the eight layers were computed using the AHP model based on the pairwise comparison matrices.

E. Weight of evidence Modelling
Weight of evidence statistical modelling is described in Equation 2 and 3; a detailed description can be found in [38].
Where npix1 is the number of pixels for the potential landslide predictive factor, npix2 is the number of pixels for the absence of potential landslide predictive factor, npix3 is the number of pixels for the potential landslide predictive factor and absence of landslides, npix4 is the number of pixels for the absence of both potential landslide predictive factor and landslides. Final weight (Wc) was calculated by Eq. 4. Wc = (W+) -(W-) (4) The weight contrast (Wc) is the difference between W+ and W-and reflects the overall spatial relationship of the causative factors and landslides.

F. Frequency ratio modelling
The impact of a contributing factor on the spatial distribution of landslides can be evaluated by FR statistical modelling. FR is the ratio of the area where landslides events happened in the area of interest. It is also the ratio of the likelihoods of a landslide incidence to a non-incidence for a given attribute [39]. The FR is computed by using Eq. 5.

FR =
where, Di is the area of landslides in the specified class; Ai is the area of class; ∑ Di is the sum of landslides in the complete study area; and ∑ Ai is the sum of area of all classes of the whole study area.

G. Landslide susceptibility index maps
LSI maps are developed by overlaying the contributing factors using Eq. 6 [29] after computing a given weight (Wc) to each factor. LSI = ∑ W c (6) Wc is total calculated weight of each factor. The LSI map is developed from the FR value based on the following equation: LSI = ∑ FR (7) To determine the predictive power of several evidential parameters, four models with different combinations (given in Table. 3) of evidential factors are obtained for LSI maps using Eq. 7. The LSI maps are validated by area under curve (AUC). Arc-SDM is a Spatial Data Modeler which calculate AUC. We give input as a true positive value which is field data and resulted models classify into hundred classes. Through that the Arc-SDM tool calculates the performance of models and converts these models into graphs which clearly shows the accuracy of models. The AUC is calculated as follows:

I. Classification of landslide LSI map
The resulting LSI map was categorized into five susceptibility zones, extending from "very low" to "very high", based on the prediction rate curve [36]. This categorization scheme is according to the natural break law of the success rate curve [40]. At the end, the accuracy of LSI map was evaluated by corresponding the percentage area covered by each susceptibility class with the percentage of area covered by a landslide incidence in each class.

III. RESULTS
The three models discussed in section 3 are used to generate susceptibility maps to find the areas which have high potential of landslides. The output susceptible maps for different combination of casual factors, FR model, and AHP model are given in Figure 5. Further, output values of AHP and FR model are given in Table 4 and Table 5, respectively. FR values (Table  4) and factor weights from AHP (Table 5) show susceptibility ( Figure 5) as with increase of number the susceptibility ratio rises in the area. The weightings of the FR model are generated based on our inventory dataset so they can be different if the dataset is changed. Whereas AHP weightings are derived based on experts' preferences using pairwise comparison matrices.  Table 3.

A. Validation
The LSI maps are validated by using the area under curve (AUC) criterion. The success rate graph for each LSI model in AUC has verified its accuracy. As shown in Figure 6, model D shows the highest accuracy. Therefore, we used the parameters from model D in the FR model for LSI mapping. The success rate curve in Figure 6 for Model "D" shows that the highly vulnerable areas having 20% have 30% of total observed landslide area. Similarly, highly vulnerable area having 50% has 70% of landslide area. Model D, the success rate curve (given in Figure 6) shows that 20% and 50% of the highly vulnerable areas have 30% and 70% of total observed landslide area, respectively. Likewise, FR Model, the success rate curve given in Figure 6 shows that 20%, 50%, and 70% of the highly vulnerable areas have 71.1%, 96%, and 97% of the total observed landslide area, respectively.

B. Prediction Power
The prediction rate curve is used to authenticate the power of the modelling to forecast future potential landslides on testing dataset [42]. The forecast accuracy of Model D, AHP, and the FR Model are assessed. The result shown in Figure 7 in Model 'D' case, 30%, 50%, and 70% of the highest vulnerable area has 55.2%, 68%, and 71.2% of the total observed landslide area, respectively.
Likewise, the result shown in Figure 7 in FR Model case, 30%, 50%, and 70% of the highest vulnerable area has 78.3%, 85%, and 95.4% of the observed landslide area, respectively.

C. Classification of landslide LSI map
From the outcomes shown in previous section it was observed that Model D, FR Model and AHP model have higher accuracy in terms of prediction to the other experimented models. Classified susceptibility maps for model D, FR model, and AHP are developed and each model is divided into susceptible classes those are "very low", "low", "moderate", "high", and "very high" shown in Figures 8 (a, b, and c). Most of these classes indeed covered all the landslides along the study area.

IV. DISCUSSION
Each causative factor's influence for landslide incidences is explored using frequency ratio, AHP, and weight of evidence modelling (Tables 4, 5). Our investigation shows that geology, precipitation, and landcover have high values for Wc and Fr. Similarly, for the AHP modelling, CR values for precipitation, proximity to faults, proximity to road, and landcover have high values followed by geology. From overall modellings it can be concluded that geology, precipitation, and landcover are the most influencing factors for landslide incidences along the KKH which is also concluded by several researchers [42][43][44].
The causative factor of geology with class Alluvium has the most influencing to land sliding followed by Precambrian metamorphic and sedimentary rocks with Wc and FR values of 1.7 and 1.47, respectively (Table 4). In our field observation along the KKH, it was observed that Alluvium and Precambrian rocks are very deformed and prone to slope failure. The precipitation factor has also high influence as with the raise of precipitation in a class the more susceptible to land sliding occurs as the highest class between 160-205 is with values of FR and Wc are 4.72 and 6.88, respectively.
The slope angles between 30-45 degree have the highest rank values for Wc and FR and area is more susceptible to landslides. For the angles more than 45 degree the FR is the highest and Wc is in negative. High steepness along the KKH can be noted that it cannot accommodate sliding material results in clean rocks and vertical debris. In aspect factor, the south facing slopes are the most susceptible as they have the highest values for Wc and FR. The same results were also found in [42] with the reason that those slopes which are south facing are having more sunshine days and due to chemical and other environmental exposures they are more susceptible for landslides.
The causative factors proximity to streams and proximity to faults have inverse relationship with distance and values of FR and Wc as with the increase of distance the values of FR and Wc decreases. As the results are agreement with other researchers [42]. The same case is with proximity to roads, as the focus of this work is KKH so it can be noticed that most of the landslides are developed due to construction of the road, however, the results show the highest values between 200-300 m. The very high susceptible area according to FR, AHP and Model 'D' are 789 km 2 , 525 km 2 , and 315 km 2 , respectively. In the AHP susceptible map, the very high susceptible class is 18.72% with 65 number of landslides of in the class, whereas it is 28.18 % in FR susceptibility map with 130 number of landslides ( Figure 9). The overall AUC results for training ( Figure 6) and testing (Figure 7) data for the three models (FR, AHP, model "D") show that FR model based generated susceptibility maps have more accurate results followed by AHP model. We can conclude that our results are in agreement with previous studies and data-based models are more accurate in prediction as compare to knowledge-based models. In addition, knowledge-based models' results may change if we use other experts with different criteria weightings and models.
One of the limitations of our study is that we used only AHP model which is a knowledge-based traditional model for decision making. However, there are several studies where researchers found better results in comparison to this model [1,[45][46][47]. In this study our intent is to observe the results of the data-based and traditional knowledge-based model for the specific highly economic significant high-way with limited area around it. This study will also help us to understands comparatively results to see the feasibility to implement same idea on other highways of importance in a geographical scattered and difficult terrain areas like Gilgit-Baltistan of Pakistan. There are also several studies which compares databased models for landslide susceptibility [48]. Data-based models are sensitive to training dataset as if source dataset is changed then the results also gets change. Thus, knowledgebased models are dependent on experts whereas data-based models are sensitive to training datasets. Although, when we observed the sub-criteria for AHP and FR models the results are almost the same, however, there are some differences of results in the first class of precipitation, slope aspect in south class, and proximity to road ( Figure 10). For example, results of AHP model for precipitation shows the first class 121-166 (mm/yr) is the most significant class whereas data-based model FR results indicate the third class 210-259 (mm/yr) is the most important class. The results of AHP and FR models for distance to faults, stream, and road follow same pattern with minor variations as weightings for the factors decrease with the increase of distance ( Figure 10).

V. CONCLUSION
The development of landslide susceptibility maps for a landslide-prone area is the topmost priority step to initiate risk mitigation for the inhabitants of the area. The Karakuram High-Way, which is the only land connecting route between Pakistan and China, has all the economic and trade dependence of the Gilgit-Baltistan province as well as of CPEC project. Considering the economic significance of this route, this research work developed landslide susceptibility maps along the KKH for 300 KM using the data from Google maps, sentinel images and field observation. The main objective of these LSMs is to provide accurate prediction regarding future occurrences of landslides. Therefore, different statistical models (Weighting evidence, FR and AHP) were evaluated based on their accuracy for LSMs. In this process, we developed and used 302 landslides and divided into randomly training (70%) and validation (30%) datasets. The resulting LSMs accuracy were validated using AUC. Our results show that Fr model has high accuracy comparable to AHP model. Precise and consistent LSMs helps to minimize human and economic loss as well as to avoid hazards like landslides by better planning in advance. Our results will help planners and decision makers to plan in advance to avoid economic and human losses which are very frequent on these types of mountainous routes. Our results and approach can also be replicated to other long routes of Gilgit-Baltistan and other parts of the world to develop LSMs to minimize physical and economical loss to infrastructures and reduce risks of loss to human life.