Document Type : Research/Original/Regular Article
Authors
1
M.Sc., Department of Engineering Geology, Faculty of Science, University of Isfahan, Isfahan, Iran
2
Department of Natural Engineering, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources, Khuzestan, Iran
3
Department of Soil Science and Engineering, Faculty of Agriculture, Khuzestan University of Agricultural Sciences and Natural Resources, Khuzestan, Iran.
Abstract
Extended Abstract
Introduction
Flooding is a major natural disaster, according to the UN, endangering lives, property, and societies more than any other. Mapping floodplains and modeling floods in mountain basins is essential for development projects, helping identify critical areas and control damage. A key 21st-century tool is satellite imagery, which provides valuable flood-related data. By processing these images, various information such as flooded areas, vegetation, lithology, slope, soil moisture, etc. can be calculated and estimated. In addition, machine learning algorithms have made it possible to estimate very complex relationships between various parameters and floods. However, these models require complex calibration and extensive data. Recently, many flood susceptibility models have been developed. Combining statistical and decision-making models with remote sensing and GIS has gained attention for improving predictive ability. Today, machine learning algorithms such as artificial neural networks, generalized linear algorithms, support vectors, and random forest models are used.Machine learning models are used in two aspects;one is to process and identify flooded areas, and the other is to zone and examine the importance of flood-intensifying parameters.
Materials and Methods
In recent decades, new methods have been used to identify the risk of flooding in basins and prepare maps of sensitivity to its occurrence, such as the use of multivariate statistical models, data mining, random forest methods, and machine learning methods. This study aimed to evaluate the performance of five machine learning models including random forest model, support vector model, generalized ensemble model, generalized linear model, classification and regression tree, and augmented tree regression (RF, SVM, BRT, CART, and GLM) in modeling flood probability in the northern mountainous basin of Khuzestan province. Also, to increase the stability and accuracy of the models, four ensemble methods were used, including simple mean, weighted mean, committee mean, and median. in the first step, thirteen different parameters were used as factors affecting the flood phenomenon, and using the collinearity test between the parameters, it was ensured that there was no strong relationship between each of them and other parameters. The factors studied are distance from the river, distance from the dam lake, curvature of the longitudinal profile of the waterway, curvature of the waterway plan, shape factor, river density, basin area, geology, vegetation, erosion factor, SPI index, TWI index, and curve number value. In this regard, the digital elevation model (DEM) of the region with an accuracy of 30 meters was extracted from the USGS website.
Results and Discussion
The AUC values for RF, BRT, SVM, CART and GLM models were estimated to be 0.932, 0.929, 0.885, 878 and 0.855, respectively. The results indicate that the random forest (RF) model has higher accuracy than other models in predicting flood risk in the study area. The results showed that all the models used showed acceptable performance; however, tree-based models had a significant advantage over linear and SVM models. In particular, the random forest (RF) model achieved the best overall performance in predicting flood occurrence in the region, with the highest AUC value of 0.932. The boosted tree regression (BRT) model was followed by the least accurate model with an AUC of 0.929. In contrast, the generalized linear model (GLM) had the lowest accuracy among the individual models with an AUC of 0.855. In addition, the results from ensemble methods also showed that the values of AUC, TPR, and FPR parameters of these four methods are in the ranges of 0.919 to 0.926, 0.826 to 0.857, and 0.072 to 0.079. Among the methods studied, the average and weighted average methods have higher accuracy than the other two methods.
Conclusion
These findings emphasize the need to pay special attention to the spatial and hydrological characteristics of the basin in flood risk management planning, and introduce the RF model and microaggregate approaches as effective strategies for preparing more accurate zoning maps. The use of four ensemble methods (Mean, WMean, Median, and Committee Averaging) resulted in more stable risk maps. Overall, the performance of these methods, effectively reduced the uncertainty resulting from the selection of a single model. The simple mean (Mean) and weighted mean (WMean) methods provided more favorable results than the other ensemble methods due to their high accuracy (AUC = 0.926) and rapid convergence of results. This indicates that combining the outputs, with or without appropriate weighting (WMean), resulted in a more robust and repeatable estimate. This is consistent with similar studies that confirm the effectiveness of averaging methods in improving the performance of classification models.
Keywords
Main Subjects