Evaluation of Machine Learning Techniques (SVM, GLM, FDA, RF) in Preparing Flood Susceptibility Map of a Part of Khuzestan Province

Document Type : Research/Original/Regular Article

Authors

1 Ph.D. Student, Department of Reclamation of Arid and Mountainous Regions, Faculty of Natural resources, University of Tehran, Tehran, Iran

2 Professor, Department of Reclamation of Arid and Mountainous Regions, Faculty of Natural resources, University of Tehran, Tehran, Iran

3 Professor, Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran

4 Associate Professor, Department of Reclamation of Arid and Mountainous Regions, Faculty of Natural resources, University of Tehran, Tehran, Iran

5 Assistant Professor, Department of Reclamation of Arid and Mountainous Regions, Faculty of Natural resources, University of Tehran, Tehran, Iran

Abstract

Introduction

Floods are among the most devastating natural disasters, causing extensive damage and significant loss of life globally. Developing countries are particularly vulnerable due to inadequate infrastructure, financial resources, and advanced technology for mitigating flood impacts. Therefore, there is a critical need to develop high-performance flood forecasting models to delineate flood-sensitive areas. The frequency, lethality, and economic impact of floods have spurred the scientific community to create sophisticated algorithms and models to manage the inherent complexity of ntural events. Data mining algorithms have revolutionized scientific research by extracting patterns from vast, unstructured datasets and predicting future trends and complex natural phenomena. Machine learning techniques, a vital subset of data mining methods, excel in making accurate predictions by addressing data limitations and preventing overfitting with proper configuration. Previous studies have demonstrated that machine learning algorithms significantly improve the speed and accuracy of mapping potential flood risks. Consequently, this study aims to develop a sensitivity map for a region in Khuzestan province using advanced machine learning algorithms. This region has experienced frequent floods, leading to substantial human and financial losses. Notably, during the floods of 2018, villages near the Dez and Karkheh dams encountered severe challenges.

Materials and Methods

The preparation of the flood risk map is based on two key hypotheses: (1) the past is indicative of the future, implying that future hazards will occur under conditions similar to those of past events, and (2) flood risk conditional factors are spatially related and can be utilized in forecasting models. To test these hypotheses, the locations of past floods were obtained from relevant authorities and verified through field visits. These locations were randomly divided into two groups: a training group (70%) and a validation group (30%).

Data on flood risk conditional factors, including topography, hydroclimatic conditions, and geological information, were collected and used to create raster maps of these predictive factors. The locations of flood points were treated as dependent variables. Machine learning algorithms, specifically Support Vector Machine (SVM), Generalized Linear Model (GLM), Flexible Discriminant Analysis (FDA), and Random Forest (RF), were applied to generate the flood risk map. The performance of the models was assessed using the area under the receiver operating characteristic curve (ROC) with the validation group data (30% of the flood points), and the best-performing model was selected. The final flood risk map was then produced based on this optimal model.

Results and Discussion

According to the collinearity analysis of the 13 factors influencing floods, all factors had tolerance thresholds greater than 0.1 and variance inflation factors less than 5. Therefore, collinearity was not an issue, and no factors needed to be removed. Flood susceptibility modeling was conducted using four models: SVM, GLM, FDA, and RF. The resulting flood hazard maps from these models were classified into five risk categories: very low, low, medium, high, and very high. The results indicated that all four models identified flat lands and surface runoff margins as areas with higher flood susceptibility. In all models, more than half of the study area was classified as having low and very low flood risk. Specifically, the SVM, GLM, FDA, and RF models identified 73.9%, 69%, 72.6%, and 63.9% of the area, respectively, as low and very low risk, with the remainder falling into medium to very high risk categories. Additionally, the RF and GLM models indicated a larger portion of the region was at high to very high risk, with 4.7% and 3.9% of the area classified as high risk, respectively. Comparing model accuracy, the RF model demonstrated the highest performance, with an area under the curve (AUC) value of 98.8%.

Conclusion

Predicting high-risk areas is crucial for guiding decisions and implementing corrective measures. This study evaluated the performance of four machine learning models—SVM, GLM, FDA, and RF—in preparing a flood hazard map for a part of Khuzestan province, using the area under the ROC curve as the evaluation metric. The results revealed that the RF model achieved the highest accuracy, with an area under the curve of 98.8%, and was identified as the most suitable model for predicting flood risk areas. According to this model, the areas classified as very low, low, medium, high, and very high risk accounted for 34.2%, 29.7%, 18.9%, 12.4%, and 4.7% of the region, respectively. Additionally, the GLM and FDA models demonstrated acceptable accuracy, with AUC values of 76.3% and 75.2%, respectively. These results underscore the efficacy of machine learning models in predicting flood risk areas. Given the increasing population, urban development, and infrastructure expansion in mountainous areas and floodplains, it is essential to develop various hazard susceptibility maps and multi-hazard maps for sustainable development. Future research should focus on evaluating different machine learning models and creating hazard maps for other potential hazards in the region, ultimately leading to the development of comprehensive multi-hazard maps. The findings of this research will assist decision-makers and policymakers in making informed management decisions for both current and future development.

Keywords

Main Subjects


منابع
چراغی قلعه سری، علی، حبیب‌نژاد روشن، محمود، و روشان، سیدحسین. (1399). تهیه نقشه حساسیت سیلاب با استفاده از مدل ماشین بردار پشتیبان (SVM) و سیستم اطلاعات جغرافیایی .(GIS) مخاطرات محیط طبیعی، 9(25)، 61–80. doi:10.22111/jneh.2020.31018.1547
رجبی‌زاده، یوسف، ایوب‌زاده، سیدعلی، و قمشی، مهدی (1398). بررسی سیل استان خوزستان طی سال آبی 1397-1398 و ارائه راهکارهای کنترل و مدیریت آن در آینده. اکوهیدرولوژی، 6(4)، 1069-1084doi:10.22059/ije.2020.285854.1166
زارع چاهوکی، محمدعلی, خلاصی اهوازی، لیلا، و آذرنیوند، حسین (1390). مدل‌سازی پراکنش گونه‌های گیاهی بر اساس عوامل خاک و توپوگرافی با استفاده از روش رگرسیون لجستیک در مراتع شرق سمنان. مرتع و آبخیزداری، 67(1)، 45-59. doi:  10.22059/jrwm.2014.50827
 
 
 
 
References
AlQahtany, A. M., & Abubakar, I. R. (2020). Public perception and attitudes to disaster risks in a coastal metropolis of Saudi Arabia. Disaster Risk Reduction, 44, 101422. doi:10.1016/j.ijdrr.2019.101422
Bozorgmehr, S. (2019). Southwest Iran hit hard by flooding, evacuation underway in Ahvaz. Reuters. Retrieved 11 April 2019. https://www.reuters.com/article/us-iran-foods/southwest-iran-hit-hard%02by-fooding-evacuation-underway-in-ahvaz-idUSKCN1RM1G6
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324
Chen, W., Chai, H., Zhao, Z., Wang, Q., & Hong, H. (2016). Landslide susceptibility mapping based on GIS and support vector machine models for the Qianyang County, China. Environmental Earth Sciences, 75(6), 474. doi:10.1007/s12665-015-5093-0
Cheraghi Ghalehsari, A., Habibnejad Roshan, M., & Roshun, S.H., (2020) Flood susceptibility mapping using a support vector machine models (SVM) and geographic information system (GIS), Natural Environmental Hazards, 9(25), 61-80. [In Persian] doi:10.22111/jneh.2020.31018.1547
Conoscenti, C., Agnesi, V., Angileri, S., Cappadonia, C., Rotigliano, E., & Märker, M. (2013). A GIS-based approach for gully erosion susceptibility modelling: a test in Sicily, Italy. Environmental Earth Sciences, 70(3), 1179–1195. doi:10.1007/s12665-012-2205-y
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783–2792. https://doi.org/10.1890/07-0539.1
Dehnavi, A., Aghdam, I. N., Pradhan, B., & Morshed Varzandeh, M. H. (2015). A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena, 135, 122–148. doi:10.1016/j.catena.2015.07.020
Goetz, J. N., Brenning, A., Petschko, H., & Leopold, P. (2015). Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Computers & Geosciences, 81, 1–11. doi:10.1016/j.cageo.2015.04.007
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press, 1, 98–164. http://www.deeplearningbook.org
Gupta, L., & Dixit, J. (2022). A GIS-based flood risk mapping of Assam, India, using the MCDA-AHP approach at the regional and administrative level. Geocarto International, 37(26), 11867–11899. doi:10.1080/10106049.2022.2060329
Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., & Ardizzone, F. (2005). Probabilistic landslide hazard assessment at the basin scale. Geomorphology, 72(1–4), 272–299. doi:10.1016/j.geomorph.2005.06.002
Hitouri, S., Mohajane, M., Lahsaini, M., Ali, S. A., Setargie, T. A., Tripathi, G., D’Antonio, P., Singh, S. K., & Varasano, A. (2024). Flood susceptibility mapping using SAR data and machine learning algorithms in a small Watershed in Northwestern Morocco. Remote Sensing, 16(5), 858. doi:10.3390/rs16050858
Hong, H., Pradhan, B., Bui, D. T., Xu, C., Youssef, A. M., & Chen, W. (2017). Comparison of four kernel functions used in support vector machines for landslide susceptibility mapping: a case study at Suichuan area (China). Geomatics, Natural Hazards and Risk, 8(2), 544–569. doi:10.1080/19475705.2016.1250112
Kalantar, B., Pradhan, B., Naghibi, S. A., Motevalli, A., & Mansor, S. (2018). Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomatics, Natural Hazards and Risk, 9(1), 49–69. doi:.org/10.1080/19475705.2017.1407368
Kalantar, B., Ueda, N., Saeidi, V., Ahmadi, K., Halin, A. A., & Shabani, F. (2020). Landslide susceptibility mapping: machine and ensemble learning based on remote sensing big data. Remote Sensing, 12(11), 1737. :doi:10.3390/rs12111737
Khosravi, K., Pourghasemi, H. R., Chapi, K., & Bahri, M. (2016). Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon’s entropy, statistical index, and weighting factor models. Environmental Monitoring and Assessment, 188(12), 656. doi:10.1007/s10661-016-5665-9
Masoudi, M., & Elhaeesahar, M. (2016). Trend assessment of climate changes in Khuzestan Province, Iran. Natural Environment Change, 2(2), 143-152. https://journals.ut.ac.ir/article_60997.html
Nachappa, T., Ghorbanzadeh, O., Gholamnia, K., & Blaschke, T. (2020). Multi-hazard exposure mapping using machine learning for the state of Salzburg, Austria. Remote Sensing, 12(17), 2757. doi:10.3390/rs12172757
Pourghasemi, H. R., Gayen, A., Edalat, M., Zarafshar, M., & Tiefenbacher, J. P. (2020). Is multi-hazard mapping effective in assessing natural hazards and integrated watershed management? Geoscience Frontiers, 11(4), 1203–1217. doi: 10.1016/j.gsf.2019.10.008
Pourghasemi, H. R., Pouyan, S., Bordbar, M., Golkar, F., & Clague, J. J. (2023). Flood, landslides, forest fire, and earthquake susceptibility maps using machine learning techniques and their combination. Natural Hazards, 116(3), 3797–3816. doi:10.1007/s11069-023-05836-y
Pourghasemi, H. R., & Rahmati, O. (2018). Prediction of the landslide susceptibility: Which algorithm, which precision? CATENA, 162, 177–192. doi:10.1016/j.catena.2017.11.022
Rahman, M., Chen, N., Islam, M. M., Mahmud, G. I., Pourghasemi, H. R., Alam, M., Rahim, M. A., Baig, M. A., Bhattacharjee, A., & Dewan, A. (2021). Development of flood hazard map and emergency relief operation system using hydrodynamic modeling and machine learning algorithm. Cleaner Production, 311, 127594. doi:10.1016/j.jclepro.2021.127594
Rajabizadeh, Y., Ayyoubzadeh, S. A., & Gholami, M., (2020). Flood survey of Khuzestan province in 97-98 and providing solutions for its control and management in the future. Ecohydrology, 6(4), 1069-1084. [In Persian] doi:10.22059/ije.2020.285854.1166.
Rahman, M., Ningsheng, C., Islam, M. M., Dewan, A., Iqbal, J., Washakh, R. M. A., & Shufeng, T. (2019). Flood susceptibility assessment in Bangladesh using machine learning and multi-criteria decision analysis. Earth Systems and Environment, 3(3), 585–601. doi:10.1007/s41748-019-00123-y
Ramsay, J. O., & Dalzell, C. J. (1991). Some Tools for functional data analysis. Royal Statistical Society. Series B (Methodological), 53(3), 539–572.
doi:10.1111/j.2517-6161.1991.tb01844.x
Reichenbach, P., Rossi, M., Malamud, B. D., Mihir, M., & Guzzetti, F. (2018). A review of statistically-based landslide susceptibility models. Earth-Science Reviews, 180, 60–91. doi:10.1016/j.earscirev.2018.03.001
Rossi, M., & Reichenbach, P. (2016). LAND-SE: a software for statistically based landslide susceptibility zonation, version 1.0. Geoscientific Model Development, 9(10), 3533–3543. doi:10.5194/gmd-9-3533-2016
Rutgersson, A., Kjellström, E., Haapala, J., Stendel, M., Danilovich, I., Drews, M., Jylhä, K., Kujala, P., Larsén, X. G., Halsnæs, K., Lehtonen, I., Luomaranta, A., Nilsson, E., Olsson, T., Särkkä, J., Tuomi, L., & Wasmund, N. (2022). Natural hazards and extreme events in the Baltic Sea region. Earth System Dynamics, 13(1), 251–301. doi:10.5194/esd-13-251-2022
Saharia, M., Jain, A., Baishya, R. R., Haobam, S., Sreejith, O. P., Pai, D. S., & Rafieeinasab, A. (2021). India flood inventory: creation of a multi-source national geospatial database to facilitate comprehensive flood research. Natural Hazards, 108(1), 619–633. doi:10.1007/s11069-021-04698-6
Sahoo, S. N., & Sreeja, P. (2017). Development of flood inundation maps and quantification of flood risk in an urban catchment of Brahmaputra River. ASCE-ASME Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, 3(1). doi:10.1061/AJRUA6.0000822
Satarzadeh, E., Sarraf, A., Hajikandi, H., & Sadeghian, M. S. (2022). Flood hazard mapping in western Iran: assessment of deep learning vis-à-vis machine learning models. Natural Hazards, 111(2), 1355–1373. doi:10.1007/s11069-021-05098-6
Schoppa, L., Disse, M., & Bachmair, S. (2020). Evaluating the performance of random forest for large-scale flood discharge simulation. Hydrology, 590, 125531. doi:10.1016/j.jhydrol.2020.125531
Segond, M.-L., Wheater, H. S., & Onof, C. (2007). The significance of spatial rainfall representation for flood runoff estimation: A numerical evaluation based on the Lee catchment, UK. Hydrology, 347(1–2), 116–131. doi:10.1016/j.jhydrol.2007.09.040
Segue, W. S., Njilah, I. K., Fossi, D. H., & Nsangou, D. (2024). Advancements in mapping landslide susceptibility in Bafoussam and its surroundings area using multi-criteria decision analysis, statistical methods, and machine learning models. African Earth Sciences, 213, 105237. doi:10.1016/j.jafrearsci.2024.105237
Towfiqul Islam, A. R. M., Talukdar, S., Mahato, S., Kundu, S., Eibek, K. U., Pham, Q. B., Kuriqi, A., & Linh, N. T. T. (2021). Flood susceptibility modelling using advanced ensemble machine learning models. Geoscience Frontiers, 12(3), 101075. doi:10.1016/j.gsf.2020.09.006
Vafakhah, M., Mohammad Hasani Loor, S., Pourghasemi, H., & Katebikord, A. (2020). Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Geosciences, 13(11), 417. doi:10.1007/s12517-020-05363-1
Vojtek, M., & Vojteková, J. (2019). Flood susceptibility mapping on a national scale in Slovakia using the analytical hierarchy process. Water, 11(2), 364. doi:org/10.3390/w11020364
Wen, T., Tiewang, W., Arabameri, A., Asadi Nalivan, O., Pal, S. C., Saha, A., & Costache, R. (2022). Land-subsidence susceptibility mapping: assessment of an adaptive neuro-fuzzy inference system–genetic algorithm hybrid model. Geocarto International, 37(26), 12194–12218. doi:10.1080/10106049.2022.2066198
Yalcin, A. (2008). GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): Comparisons of results and confirmations. catena, 72(1), 1–12. doi:10.1016/j.catena.2007.01.003
Youssef, A. M., Mahdi, A. M., Al-Katheri, M. M., Pouyan, S., & Pourghasemi, H. R. (2023). Multi-hazards (landslides, floods, and gully erosion) modelling and mapping using machine learning algorithms. African Earth Sciences, 197, 104788. doi:10.1016/j.jafrearsci.2022.104788
Yu, L., Wang, Y., & Pradhan, B. (2024) Enhancing landslide susceptibility mapping incorporating landslide typology via stacking ensemble machine learning in Three Gorges Reservoir, China.Geoscience Frontiers, 15(4). doi: 10.1016/j.gsf.2024.101802
Zare Chahouki, M.A., Khalsi Ahvazi, L., & Azarnivand, H. (2020) Plant species distribution modeling using logistic regression models in the North East of Semnan, Rangeland and watershed, 67(1), 45-59. [In Persian] doi:10.22059/jrwm.2014.50827
Zhou, C., Yin, K., Cao, Y., Ahmed, B., Li, Y., Catani, F., & Pourghasemi, H. R. (2018). Landslide susceptibility modelling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Computers & Geosciences, 112, 23–37. doi:10.1016/j.cageo.2017.11.019