Identifying the threshold of variables affecting flood zones using machine learning technique (Case study: the downstream region of the Karun River)

Document Type : Research/Original/Regular Article

Authors

1 Assistant Professor/, Department of Hormoz Studies and Research Center, Hormozgan University, Bandarabaas, Iran

2 Ph.D./, Department of Watershed Management Engineering, Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, Noor, Iran

Abstract

Abstract
Introduction
Machine learning is a new artificial intelligence method that seeks to write a program with the best performance by using learning experience. Machine learning models with different algorithms can be predictive or descriptive or have both properties and be used in different fields. On the other hand, for better management of flood risk reduction, it is necessary to know the effective factors in each region and flood sensitivity analysis. Since so far, few researchers have analyzed the threshold of influence of variables affecting the occurrence of floods using machine learning methods, the current research is new in this respect. Based on this, the current research has been conducted to identify the threshold of variables affecting the zoning of flooded areas using machine learning and remote sensing data in the Karun Basin area. The results can be put on the agenda of the relevant managers in identifying the influence limits of different variables on the occurrence of floods and the management of flood-sensitive areas by relying on the effective limits of the variables in the study area.
 
Materials and Methods
Landsat OLI 8 images on April 8, 2019 were used to identify flooded areas. In this regard, to identify groundwater, the corresponding image of the previous year of the region was used to separate and identify groundwater zones. Then, the remaining pixels of the study area as whole samples and flooded areas were entered into the modelling process as target samples. Therefore, flooded areas with a code of one and other areas with a code of zero entered the modelling process as dependent variables. Also, the variables that were entered as independent variables in the machine learning process include actual evaporation and transpiration, land use, soil density mass, soil clay percentage, soil water deficit, DEM, NDVI, land cover index, Palmer drought severity index, potential evaporation and transpiration, precipitation. cumulative, soil sand percentage, soil texture, soil moisture, minimum and maximum temperature. Next, by entering these variables and performing the machine learning process, the models were evaluated and TreeNet was selected as the best model. Then the threshold of each of the studied components on flood zones was obtained from machine learning. Also, in the present study, learning and test data were used in a ratio of 70% to 30% and completely randomly. It is worth noting that the number of 200 trees with at least six nodes was set for modelling.
 
Results and Discussion
Different components have certain thresholds at the beginning of land flooding so regarding vegetation as the most important effective factor in flood zoning, it shows that the lack of vegetation causes flooding, and the higher the level of vegetation, the more it prevents flooding. Also, the cumulative precipitation threshold for flooding the studied area was 15 mm of rainfall, and less than that, the incoming rainfall did not pose a risk of flooding the studied lands. The amount of 15.5 mm of rainfall was the turning point and the threshold of the beginning of the flooding in the study area. Regarding the soil moisture deficiency index, it shows that the threshold of flooding based on this index was 144, in other words when the soil moisture profile is more than the mentioned value, the incoming precipitation must compensate for the soil moisture deficiency, and as a result, floods will be prevented. On the other hand, most flooding conditions have existed at a height of 16 m, and as the height increases, the risk of flooding the studied area decreases, so that there is a failure at a height of 19 m, and when the height reaches 22.5 m, the risk Flooding disappears, and at a height higher than 26 m, flooding is restrained and will reach a steady state. The reason for this can be the plainness of the studied area and the widening of the flood zone in the plain.
 
Conclusion
The results showed that the components of the vegetation cover index, cumulative precipitation, soil water deficit, Palmer drought index, height, and surface soil moisture respectively had the greatest effect on the flooding of the studied area. Also, in the studied area, the effect of soil sand percentage, soil clay percentage, soil density, potential evaporation and transpiration, slope direction, maximum daily temperature, and soil texture on flood zoning was insignificant. The evaluation of the efficiency of the model with the indicators of ROC, specificity, sensitivity, and overall accuracy is 0.95, 91.2, 90.43, and 91.12, respectively, which indicates accuracy. The results of flood zoning with the ground reality indicated R2 and MAE equal to 72.8% and 0.27%, which confirms the accuracy of the zoning results with the ground reality relatively well. The analysis of the results shows that there will be an increased risk of flooding in the wetland and swamp areas due to the high humidity and water level. The results of the present research can be used by planners and managers of natural hazards to reduce floods.

Keywords

Main Subjects