Monthly prediction of pan evaporation using individual and combined approach of data mining models in arid regions

Document Type : Research/Original/Regular Article

Authors

1 Assistant Professor , Department of Desert Management & Control, Faculty of Natural Resource, Higher education complex of saravan,

2 Assistant Professor (Corresponding Author), Department of Desert Management & Control, Faculty of Natural Resource, Higher Educational Complex of Saravan,Saravan, Iran

3 High Education Complex of Saravan, Pasdaran Street, Saravan city, Sistan va Baluchestan Province, IRAN

Abstract

This is exceptionally true in arid to semi-arid areas where the molecules have enough heat energy to escape. Therefore, an accurate estimation of the evaporation losses plays a pivotal role in better water resources management, crop water demands and irrigation scheduling. Owing to its capacity in tackling the complexity accompanied by highly stochastic features of many environmental problems, machine learning (ML) methods have been recently identified as a paramount method to address various aspects of the association between predictors and predictands, and also it has been demonstrated that the ML performances were superior in comparison with other methods and depending on the input factors under various climatic conditions, the performance varies. However, the hybrid technique, where two or more models are combined and coupled, has recently drew more attention in climate and hydrology studies because of its capacity to capture the various patterns in data series by combining multi-technique features in one algorithm.Therefore, in this research, the potential of several groups of machine learning models will be investigated individually and a new hybrid approach for estimating evaporation from pans in the area of Sistan and Baluchistan province.

In this research, the data of the synoptic stations of Sistan and Baluchistan province including Zahedan, Khash, Saravan, Iranshahr and Chabahar during the statistical period of 1980-2019 were used as input to the models and the evaporation measurement data from the pan of these stations were used as the observed values of evaporation. There are two approaches to run deep learning models. First, the basis of the individual performance of the models, and second, the new approach of hybrid techniques, in which two or more models are combined and paired. In this research, in the approach of individual performance of deep learning models, eight deep learning models were used to simulate and predict evaporation from the pan. In addition to the individual performance approach, the combined VEDL approach was used in order to provide a hybrid model (combination of the mentioned eight individual models of deep learning). In this hybrid approach to regression issues, the estimators of all models are averaged to obtain an estimate for a set called vote regressors (VRs). There are two approaches to awarding votes: average voting (AV) and weighted voting (WV). In the case of AV, the weights are equivalent and equal1. A disadvantage of AV is that all of the models in ensemble are accepted as equally effective; however, this situation is very unlikely, especially if different machine learning algorithms are used. WV specifies a weight coefficient to each ensemble member. The weight can be a floating-point number between 0 and 1, in which case the sum is equal to 1, or an integer starting at 1 denoting the number of votes given to the corresponding ensemble member. the weight of each model was selected based on the accuracy of the model's performance using the evaluation criteria obtained from the training implementation section of individual models. the model’s performance was assessed using statistical measures, including coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and Taylor diagram. The results showed that all the models had very good results in both training and testing stages. Among the individual models, the artificial neural network (ANN) model with R2 equal to 0.89 and RMSE equal to 45.95 in the training phase and R2 equal to 0.96 and RMSE equal to 44.18 in the validation phase has the best performance in both training and testing phases in Zahedan synoptic station. And it is introduced as the best deep learning model in this station in order to predict evaporation from the monthly pan. The rest of the models also performed very well in both the training and validation stages, so in the training stage, SVM model with R2 value equal to 0.89, RF model with R2 value equal to 0.88 are in the next ranks. And in the BART model validation phase, it ranks second with R2 value equal to 0.96. Among the models, the lowest accuracy is related to the tree model (TM) with R2 values of 0.84 and 0.93 for the training and validation stages, respectively. The results of the individual implementation of the models in the studied stations showed that among all the models, ANN, SVM and RF models had the best performance in two stages of training and testing. In the test stage, the SVM model with R2 values equal to 0.94, 0.96 and 0.94 has performed best in Khash, Iranshahr and Chabahar stations. In the Saravan station, in the testing stage, the RF model has performed best among the models with an R2 value of 0.94. Next, as mentioned, in order to provide a deep learning hybrid model, the VEDL technique was used using weighted voting for the training stage. The results showed that the combined model improved the results compared to the best individual model, so that the RMSE values increased from 45.95 to 33.1, the R2 values increased from 0.89 to 0.94, and the MAE value improved from 32.92 to 23.9. The evaluation of individual ANN model and VEDL model using Taylor diagram also shows the better performance of VEDL model than individual model.

The results showed that among all the models, ANN, SVM and RF models had the best performance in two stages of training and verification. In the validation stage, the SVM model with R2 values equal to 0.94, 0.96 and 0.94 has performed best in Khash, Iranshahr and Chabahar stations. At the Saravan station, in the Sensji validity stage, the RF model with an R2 value of 0.94 had the best performance among the models. The excellent performance of the models in the two stages of training and validation is another finding of the research, These results are consistent with the results of researchers who have expressed the appropriate efficiency of machine learning models in estimating evaporation/evaporation and transpiration in different climatic regions of Iran.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 30 April 2023
  • Receive Date: 17 April 2023
  • Revise Date: 30 April 2023
  • Accept Date: 30 April 2023