Monthly prediction of pan evaporation using individual and combined approaches of data mining models in arid regions

Document Type : Research/Original/Regular Article

Authors

1 Assistant Professor , Department of Desert Management & Control, Faculty of Natural Resource, Higher education complex of saravan,

2 Assistant Professor (Corresponding Author), Department of Desert Management & Control, Faculty of Natural Resource, Higher Educational Complex of Saravan,Saravan, Iran

3 High Education Complex of Saravan, Pasdaran Street, Saravan city, Sistan va Baluchestan Province, IRAN

Abstract

Introduction
Evaporation, the process by which water molecules escape a surface after absorbing sufficient energy to overcome vapor pressure, is a major contributor to water scarcity, especially in arid and semi-arid regions where heat readily facilitates this escape. Accurately estimating evaporation losses is crucial for effective water resource management, crop water demand prediction, and irrigation scheduling. Machine learning (ML) has emerged as a powerful tool for tackling the complex and stochastic nature of environmental problems. ML models excel at identifying relationships between predictor variables and outcomes (predictands), often surpassing traditional methods. However, their performance can vary depending on input factors and climatic conditions. Recently, hybrid techniques that combine multiple models have gained traction in climate and hydrology studies. These techniques leverage the strengths of different approaches within a single algorithm, potentially capturing more complex patterns in data series. This research will explore the potential of various individual ML models and propose a novel hybrid approach for estimating pan evaporation in Sistan and Baluchistan Province.
 
Materials and Methods
This study investigates pan evaporation simulation and prediction in Sistan and Baluchistan Province, Iran. Synoptic station data (1980-2019) served as model inputs, while pan evaporation measurements from these stations provided the observed values. In this research, in the approach of individual performance of data mining models, eight data mining models were used to simulate and predict evaporation from the pan. In addition to the individual performance approach, the combined VEDL approach was used to provide a hybrid model (a combination of the mentioned eight individual models of deep learning). In this hybrid approach to regression issues, the estimators of all models are averaged to obtain an estimate for a set called vote regressors (VRs). There are two approaches to awarding votes: average voting (AV) and weighted voting (WV). In the case of AV, the weights are equivalent and equal1. A disadvantage of AV is that all of the models in the ensemble are accepted as equally effective; however, this situation is very unlikely, especially if different machine learning algorithms are used. WV specifies a weight coefficient for each ensemble member. The weight can be a floating-point number between Zero and one, in which case the sum is equal to one, or an integer starting at one denoting the number of votes given to the corresponding ensemble member. the weight of each model was selected based on the accuracy of the model's performance using the evaluation criteria obtained from the training implementation section of individual models. the model’s performance was assessed using statistical measures, including R2, RMSE, MAE, and Taylor diagram.
 
Results and Discussion
The results showed that all the models had very good results in both the training and testing stages. All models exhibited excellent performance during training and testing. The Artificial Neural Network (ANN) achieved the highest accuracy in both phases at the Zahedan station (R² = 0.89, RMSE = 45.95 in training; R² = 0.96, RMSE = 44.18 in validation). It emerged as the best model for monthly pan evaporation prediction at this station. Other models also performed well, with the Support Vector Machine (SVM) and Random Forest (RF) models achieving R² values of 0.89 and 0.88 in training, respectively. Notably, the BART model ranked second in validation (R² = 0.96). The Tree Model (TM) had the lowest accuracy (R² = 0.84 and 0.93 in training and validation, respectively). Across all stations, ANN, SVM, and RF consistently delivered the best results in both training and testing. In the test phase, the SVM model outperformed others in Khash, Iranshahr, and Chabahar stations (R² = 0.94, 0.96, and 0.94, respectively). At the Saravan station, the RF model achieved the highest R² (0.94) during testing. To develop a hybrid data mining model, the Voting Ensemble for Deep Learning (VEDL) technique was employed with weighted voting in the training stage. The combined model significantly improved upon the best individual model. RMSE decreased from 45.95 to 33.1, R² increased from 0.89 to 0.94, and MAE improved from 32.92 to 23.9. Evaluation using the Taylor diagram further confirmed the superior performance of the VEDL model compared to the individual ANN model.
 
Conclusion
The results showed that among all the models, ANN, SVM, and RF models had the best performance in the two stages of training and verification. In the validation stage, the SVM model with R2 values equal to 0.94, 0.96, and 0.94 performed best in the Khash, Iranshahr, and Chabahar stations. At the Saravan station, in the Sensji validity stage, the RF model with an R2 value of 0.94 had the best performance among the models. The excellent performance of the models in the two stages of training and validation is another finding of the research, These results are consistent with the results of researchers who have expressed the appropriate efficiency of machine learning models in estimating evaporation/evaporation and transpiration in different climatic regions of Iran. The results of the combined model showed that the combined model improved the results compared to the best individual model so that the RMSE values increased from 45.95 to 33.1, the R2 values increased from 0.89 to 0.94, and the MAE value improved from 32.92 to 23.9. The use of the VEDL approach to estimate evaporation from the pan was a new approach that has not been used in past studies. Therefore, according to the results of this research, the proposed deep sensing model is proposed to estimate the evaporation of arid and semi-arid areas for water resources management and agricultural planning.

Keywords

Main Subjects