Document Type : Research/Original/Regular Article
Authors
1
Department of Reclamation of Arid and Mountainous Regions،Faculty of Natural Resources، university of Tehran، Tehran، Iran
2
university of Tehran
3
Shiraz University
4
Department of Reclamation of Arid and Mountainous Regions, Natural resources, University of Tehran
5
Tehran university
Abstract
Introduction
Floods are among the most devastating natural disasters, causing extensive damage and significant loss of life globally. Developing countries are particularly vulnerable due to inadequate infrastructure, financial resources, and advanced technology for mitigating flood impacts. Therefore, there is a critical need to develop high-performance flood forecasting models to delineate flood-sensitive areas. The frequency, lethality, and economic impact of floods have spurred the scientific community to create sophisticated algorithms and models to manage the inherent complexity of ntural events. Data mining algorithms have revolutionized scientific research by extracting patterns from vast, unstructured datasets and predicting future trends and complex natural phenomena. Machine learning techniques, a vital subset of data mining methods, excel in making accurate predictions by addressing data limitations and preventing overfitting with proper configuration. Previous studies have demonstrated that machine learning algorithms significantly improve the speed and accuracy of mapping potential flood risks. Consequently, this study aims to develop a sensitivity map for a region in Khuzestan province using advanced machine learning algorithms. This region has experienced frequent floods, leading to substantial human and financial losses. Notably, during the floods of 2018, villages near the Dez and Karkheh dams encountered severe challenges.
Materials and Methods
The preparation of the flood risk map is based on two key hypotheses: (1) the past is indicative of the future, implying that future hazards will occur under conditions similar to those of past events, and (2) flood risk conditional factors are spatially related and can be utilized in forecasting models. To test these hypotheses, the locations of past floods were obtained from relevant authorities and verified through field visits. These locations were randomly divided into two groups: a training group (70%) and a validation group (30%).
Data on flood risk conditional factors, including topography, hydroclimatic conditions, and geological information, were collected and used to create raster maps of these predictive factors. The locations of flood points were treated as dependent variables. Machine learning algorithms, specifically Support Vector Machine (SVM), Generalized Linear Model (GLM), Flexible Discriminant Analysis (FDA), and Random Forest (RF), were applied to generate the flood risk map. The performance of the models was assessed using the area under the receiver operating characteristic curve (ROC) with the validation group data (30% of the flood points), and the best-performing model was selected. The final flood risk map was then produced based on this optimal model.
Results and Discussion
According to the collinearity analysis of the 13 factors influencing floods, all factors had tolerance thresholds greater than 0.1 and variance inflation factors less than 5. Therefore, collinearity was not an issue, and no factors needed to be removed. Flood susceptibility modeling was conducted using four models: SVM, GLM, FDA, and RF. The resulting flood hazard maps from these models were classified into five risk categories: very low, low, medium, high, and very high. The results indicated that all four models identified flat lands and surface runoff margins as areas with higher flood susceptibility. In all models, more than half of the study area was classified as having low and very low flood risk. Specifically, the SVM, GLM, FDA, and RF models identified 73.9%, 69%, 72.6%, and 63.9% of the area, respectively, as low and very low risk, with the remainder falling into medium to very high risk categories. Additionally, the RF and GLM models indicated a larger portion of the region was at high to very high risk, with 4.7% and 3.9% of the area classified as high risk, respectively. Comparing model accuracy, the RF model demonstrated the highest performance, with an area under the curve (AUC) value of 98.8%.
Conclusion
Predicting high-risk areas is crucial for guiding decisions and implementing corrective measures. This study evaluated the performance of four machine learning models—SVM, GLM, FDA, and RF—in preparing a flood hazard map for a part of Khuzestan province, using the area under the ROC curve as the evaluation metric. The results revealed that the RF model achieved the highest accuracy, with an area under the curve of 98.8%, and was identified as the most suitable model for predicting flood risk areas. According to this model, the areas classified as very low, low, medium, high, and very high risk accounted for 34.2%, 29.7%, 18.9%, 12.4%, and 4.7% of the region, respectively. Additionally, the GLM and FDA models demonstrated acceptable accuracy, with AUC values of 76.3% and 75.2%, respectively. These results underscore the efficacy of machine learning models in predicting flood risk areas. Given the increasing population, urban development, and infrastructure expansion in mountainous areas and floodplains, it is essential to develop various hazard susceptibility maps and multi-hazard maps for sustainable development. Future research should focus on evaluating different machine learning models and creating hazard maps for other potential hazards in the region, ultimately leading to the development of comprehensive multi-hazard maps. The findings of this research will assist decision-makers and policymakers in making informed management decisions for both current and future development.
Keywords
Main Subjects