Citation: | Wen-geng Cao, Yu Fu, Qiu-yao Dong, Hai-gang Wang, Yu Ren, Ze-yan Li, Yue-ying Du, 2023. Landslide susceptibility assessment in Western Henan Province based on a comparison of conventional and ensemble machine learning, China Geology, 6, 409-419. doi: 10.31035/cg2023013 |
Landslide is a serious natural disaster next only to earthquake and flood, which will cause a great threat to people’s lives and property safety. The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective , difficult to quantify, and no pertinence. As a new research method for landslide susceptibility assessment, machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models. Taking Western Henan for example, the study selected 16 landslide influencing factors such as topography, geological environment, hydrological conditions, and human activities, and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination (RFE) method. Five machine learning methods [Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Linear Discriminant Analysis (LDA)] were used to construct the spatial distribution model of landslide susceptibility. The models were evaluated by the receiver operating characteristic curve and statistical index. After analysis and comparison, the XGBoost model (AUC 0.8759) performed the best and was suitable for dealing with regression problems. The model had a high adaptability to landslide data. According to the landslide susceptibility map of the five models, the overall distribution can be observed. The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest, the Xiaoshan Mountain range in the west, and the Yellow River Basin in the north. These areas have large terrain fluctuations, complicated geological structural environments and frequent human engineering activities. The extremely high and highly prone areas were 12043.3 km2 and 3087.45 km2, accounting for 47.61% and 12.20% of the total area of the study area, respectively. Our study reflects the distribution of landslide susceptibility in western Henan Province, which provides a scientific basis for regional disaster warning, prediction, and resource protection. The study has important practical significance for subsequent landslide disaster management.
Bao H, Zeng CY, Peng Y, Wu SH. 2022. The use of digital technologies for landslide disaster risk research and disaster risk management: progress and prospects. Environmental Earth Sciences, 81(18), 446–456. doi: 10.1007/s12665-022-10575-7. |
Bennett ND, Croke BFW, Guariso G, Guillaume GHA, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LTH, Norton JP, Perrin C, Pierce SA, Robson B, Seppelt R, Voinov AA, Fath BD, Andreassian V. 2013. Characterising performance of environmental models. Environmental Modelling and Software, 40, 1‒20. doi: 10.1016/j.envsoft.2012.09.011. |
Brabb EE. 1987. Innovative approaches to landslide hazard and risk mapping. 307–324. doi: 10.1016/0148-9062(87)91363-5. |
Breiman L. 2001. Random forests. Machine Learning, 45, 5–32. doi: 10.1023/A:1010933404324. |
Brenning A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6), 853‒862. doi: 10.5194/nhess-5-853-2005. |
Bui DT, Lofman O, Revhaug I, Dick O. 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards, 59, 1413–1444. doi: 10.1007/s11069-011-9844-2. |
Budimir MEA, Atkinson PM, Lewis HG. 2015. A systematic review of landslide probability mapping using logistic regression. Landslides, 12, 419–436. doi: 10.1007/s10346-014-0550-5. |
Cantarino I, Carrion MA, Goerlich F, Ibañez VM. 2019. A ROC analysis-based classification method for landslide susceptibility maps. Landslides, 16, 265–282. doi: 10.1007/s10346-018-1063-4. |
Causes L. 2001. Landslide types and processes. US Geological Survey: Reston. VA. USA. 10. |
Chen T, Guestrin C. 2016. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785‒794. doi: 10.1145/2939672.2939785. |
Dai FC, Lee CF, Ngai YY. 2002. Landslide risk assessment and management: an overview. Engineering Geology, 64(1), 65–87. doi: 10.1016/s0013-7952(01)00093-x. |
Fan Z, Xu Y, Zhang D. 2011. Local linear discriminant analysis framework using sample neighbors. IEEE Transactions on Neural Networks, 22(7), 1119–1132. doi: 10.1109/tnn.2011.2152852. |
Friedman JH. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189‒1232. doi: 10.1214/aos/1013203451. |
Friedman JH. 2002. Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367–378. doi: 10.1016/s0167-9473(01)00065-2. |
Guzzetti F. 2006. Landslide Hazard and Risk Assessment. Transportation Research Board Special Report. 373. |
Ministry of Natural Resources of the people’s Republic of China. 2022. National Geological Disaster Situation in 2021 and Geological disaster Trend forec ast in 2022. mnr.gov.cn/dt/ywbb/202201/t20220113_2717375.html. |
Munasinghe K, Karunanayake P. 2021. Recursive feature elimination for machine learning-based landslide prediction models. 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 126–129. doi: 10.1109/icaiic51459.2021.9415232. |
Noble WS. 2006. What is a support vector machine. Nature biotechnology, 24(12), 1565–1567. doi: 10.1038/nbt1206-1565. |
Oommen T, Misra D, Twarakavi NKC, Prakash A, Sahoo B, Bandopadhyay S. 2008. An objective analysis of support vector machine based classification for remote sensing. Mathematical geosciences, 40, 409–424. doi: 10.1007/s11004-008-9156-6. |
Savvaidis PD. 2003. Existing landslide monitoring systems and techniques. In Proceedings of the conference from stars to earth and culture, The Aristotle University of Thessaloniki. Thessaloniki, Greece, 242-258. |
Sharma A, Paliwal KK. 2015. Linear discriminant analysis for the small sample size problem: an overview. International Journal of Machine Learning and Cybernetics, 6, 443–454. doi: 10.1007/s13042-013-0226-9. |
Tibshirani R. 1996. Bias, variance and prediction error for classification rules. University of Toronto, Department of Statistics. 13. |
Vapnik V. 1999. The nature of statistical learning theory. Springer Science & Business Media. 314. doi: 10.1007/978-1-4757-3264-1. |
Xu WJ, Jie YX, Li QB, Wang XB, Yu YZ. 2014. Genesis, mechanism, and stability of the Dongmiaojia landslide, yellow river, China. International journal of rock mechanics and mining sciences, 67, 57–68. doi: 10.1016/j.ijrmms.2014.01.010. |
Yu FD, Qiao G, Wang K, Zhang X. 2023. Investigation of groundwater characteristics and its influence on Landslides in Heifangtai Plateau using comprehensive geophysical methods. Journal of Groundwater Science and Engineering, 11(2), 171–182. |
Zheng XX, He GJ, Wang SS, Wang Y, Wang GZ, Yang ZY, Yu JC, Wang N. 2021. Comparison of machine learning methods for potential active landslide hazards identification with multi-source data. ISPRS International Journal of Geo-Information, 10(4), 253–274. doi: 10.3390/ijgi10040253. |
Method adopted in the present study
Geographical location of the study area in Henan Province (a) and location of landslide in the study area (b).
Affecting factors of typical landslides.
Importance ranking of landslide affecting factors based on RFE.
The ROC curves of different landslide models using testing dataset.
Landslide susceptibility maps of different landslide models.
Distribution of pixels of high and very high classes on slope map.