Statistical and multi-criteria methods for preprocessing meteorological data in reference evapotranspiration

Document Type : Special issue on "Climate Change and Effects on Water and Soil"

Authors

1 Associate Professor, Faculty of Agriculture, Azarbaijan Shahid Madani University, Tabriz, Iran

2 M.Sc Student, Faculty of Agriculture, Azarbaijan Shahid Madani University, Tabriz, Iran

Abstract

Introduction

Knowledge of actual evapotranspiration is valuable for assessing water availability in policy and decision-making ‌of water resources and agriculture. Despite all improvements, the measurement of actual evapotranspiration is accompanied by difficulty in some locations. In this regard, an accurate method for actual evapotranspiration estimation is linked to the reference evapotranspiration (ETo) determination as a significant component. The Food and Agriculture Organization of the United Nations (FAO) Penman-Monteith method is widely recognized for its high accuracy and making it a globally accepted standard. Despite the acceptability of the FAO Penman-Monteith method, the need for a large amount of reliable weather measurements, such as solar radiation and wind speed, has challenged the method. These data are often not available in developing countries, and the issue is related to the limited number of equipped meteorological stations or inaccuracies of measurement. Therefore, the need for an alternative ETo method seems necessary, and the efficient artificial intelligence techniques with a low number of input data can obtain accuracy equal to the FAO method. In this regard, the preprocessing step with a selection of important input data is more important. This study introduces a novel approach by systematically comparing multiple preprocessing methods for ETo estimation by integrating decision making techniques to improve data selection and model accuracy. The preprocessing methods belong to the correlation concept, regression analysis, and decision making approach, with different normalization methods. To increase the accuracy of decisions, more than one evaluation criteria were considered in the analysis.

Materials and Methods

The analysis of this study is focused on eleven stations (1992-2021). The station's spatial distribution consists of the North, West, North-West, East, and center of Iran. The preprocessing step in the modeling process has great importance in deriving the effective and precise factors as the input data. Several preprocessing methods were investigated in this study to identify the dominant input data for ETo estimation. They include the Pearson correlation coefficient, Kendall’s tau-b correlation coefficient, standardized Beta coefficient, stepwise regression, Shannon’s entropy, and simple additive weighting with fuzzy normalization. These methods were selected for their ability to assess important variables with data analysis from different aspects by correlation detection and data normalization, ensuring accurate ETo estimation. The Pearson correlation coefficient can distinguish the correlation between independent and dependent variables; higher values indicate higher dependency. The emphasis of stepwise regression is on the best and most impressive variables from a large set of variables. Decision making is not always between two options, and sometimes we have to make the right selection among several options. In this case, a multi-criteria decision is made, depending on the sensitivity of the problem, for which certain methods can help to reach the best option. Some methods are illustrated to solve MCDM problems, such as Shannon’s entropy. The process of entropy analysis is to assign the weights of the objective criterion. The assumption of entropy analysis is the importance of data with high-weight indicators relative to the data with low-weight indicators. The regression analysis aims to minimize the error between observed and forecasted values; this matter can be possible by SVR, which used as the model in this study.



Results and Discussion

The maximum Pearson correlation coefficient in the monthly scale is related to the solar radiation, maximum and minimum temperature in all stations. This matter was preserved by τ Kendall correlation coefficient. The derived meteorological data in the stepwise regression at the annual scale can be described as the relative humidity, wind speed, solar radiation, maximum temperature in Maku, wind speed, maximum temperature, solar radiation, sunshine hours in Yazd. Decision making analysis needs some criteria, and five criteria, RMSE, R, MAE, NSE, and GMER, were applied in Shannon’s entropy method. The selected are used to find the best solution from all data (Tmax, Tmin, RH, U, S, and R), and different combinations of data. The combination 3-7; the number of input data is equal to 3, and the data are wind speed, solar radiation, and sunshine hours, has the highest weight, pink in Maku. In the monthly scale and the combination with five input data, the RMSE of all stations related to Shannon’s entropy is higher than fuzzy normalization, except Mashhad with the same RMSE in the two methods, and Zanjan and Yazd with a low error of Shannon’s entropy. In two scales, the performance of fuzzy normalization is in a good state. In the annual scale, the Pearson correlation and stepwise regression have the same function. In the monthly scale, stepwise regression has poor performance. The selection of input data based on fuzzy normalization could decrease the error of the simulation.

Conclusion

The results indicated that the normalization process had better performance in the preprocessing method based on the MCDM approach relative to the other methods. The average of the criteria showed that the best method has no limitations regarding to the three types of different climates, wet, semiarid, and arid, and the fuzzy normalization had good performance. This method has no geographical limitation. Determining an efficient method for the preprocessing step has an acceptable response in all climates, which is one of the strengths and innovations of the research. One of the things that can strongly affect the preprocessing method based MCDM approach is the type of decision making method. In the decision making problem, the used method for normalization of the decision matrix has high importance in information extraction. In general, maximum temperature, relative humidity, wind speed, solar radiation, sunshine hours (annual), and minimum temperature (monthly) were introduced as the effective data. The reason for the better performance of certain data combination is related to the high dependency of these combinations with ETo variation.

Generally, using the exact method as the preprocessing step in each climate based on the data capabilities of area and selection of the effective data can upgrade the efficiency of ETo estimation. It can led to the precise determination of water availability and strong policymaking in irrigation planning, agricultural studies.

Keywords

Main Subjects


References
Gong, A. M. (2016). Using automatic calibration method for optimizing the performance of Pedotransfer functions of saturated hydraulic conductivity. Ain Shams Engineering Journal, 7(2), 653-662. doi: 10.1016/j.asej.2015.05.012
Ahmadi, F., Mehdizadeh, S., Mohammadi, B., Pham, Q. B., Doan, T. N. C., Vo, N. D. (2021). Application of an artificial intelligence technique enhanced with intelligent water drops for monthly reference evapotranspiration estimation. Agricultural Water Management, 244, 106622. doi: 10.1016/j.agwat.2020.106622
Ahmadpari, H., Khaustov, V.  (2025). Analyzing meteorological and hydrological droughts in the Darreh Dozdan River basin through drought indices. Environment and Water Engineering, 11(2), 174-184. doi: 10.22034/ewe.2025.506959.2004
Berti, A., Tardivo, G., Chiaudani, A., Rech, F., Borin, M. (2014). Assessing reference evapotranspiration by the Hargreaves method in north-eastern Italy. Agricultural Water Management, 140, 20-5. doi: 10.1016/j.agwat.2014.03.015
Butchart-Kuhlmann, D., Kralisch, S., Fleischer, M., Meinhardt, M., Brenning, A. (2018). Multicriteria decision analysis framework for hydrological decision support using environmental flow components. Ecological Indicators, 93, 470-480. doi:10.1016/j.ecolind.2018.04.057
Cai, W., Wen, X., Li, C., Shao, J., Xu, J. (2023). Predicting the energy consumption in buildings using the optimized support vector regression model. Energy, 273, 127188. doi: 10.1016/j.energy.2023.127188
Chauhan, S., Shrivastava, R. K. (2009). Performance evaluation of reference evapotranspiration estimation using climate based methods and artificial neural networks. Water Resources Management, 23(5), 825-837. doi: 10.1007/s11269-008-9301-5
De Martonne, E. (1925). TraitéGéographie. Physique: 3 tomes. Max leclcrc and H. Bourrclier, proprietors of LibrairicArmard Colin: Paris.
Dooley, A.E., Smeaton, D. C., Sheath, G. W., Ledgard, S. F. (2009). Application of multiple criteria decision analysis in the New Zealand agricultural industry. The Journal of Multi-Criteria Decision Analysis, 16(1‐2), 39-53. doi: 10.1002/mcda.437
Dwivedi, P. P., Sharma, D. K. (2022a). Application of Shannon Entropy and COCOSO techniques to analyze performance of sustainable development goals: The case of the Indian Union Territories. Results in Engineering, 14, 100416. doi: 10.1016/j.rineng.2022.100416
Dwivedi, P. P., Sharma, D. K. (2022b). Application of Shannon entropy and CoCoSo methods in selection of the most appropriate engineering sustainability components. Cleaner Materials, 5, 100118. doi: 10.1016/j.clema.2022.100118
Ellenburg, W. L., Cruise, J., Singh, V. P. (2017). The Role of Evapotranspiration in Streamflow Modeling-an Analysis Using Entropy Theory. In AGU Fall Meeting Abstracts 2017 Dec (Vol. 2017, pp. H23C-1677).
Fu, T., Li, X., Jia, R., Feng, L. (2021). A novel integrated method based on a machine learning model for estimating evapotranspiration in dryland. Journal of Hydrology, 603, 126881. doi: 10.1016/j.jhydrol.2021.126881
Ghabaei Sough, M., Mosaedi, A., Hesam, M., Hezarjaribi, A. (2010). Evaluation Effect of Input Parameters Preprocessing in Artificial Neural Networks (Anns) by Using Stepwise Regression and Gamma Test Techniques for Fast Estimation of Daily Evapotranspiration. Water and Soil, 24(3), 610-624. doi: 10.22067/jsw.v0i0.3631
Gong, D., Hao, W., Gao, L., Feng, Y., Cui, N. (2021). Extreme learning machine for reference crop evapotranspiration estimation: Model optimization and spatiotemporal assessment across different climates in China. Computers and Electronics in Agriculture, 187, 106294. doi: 10.1016/j.compag.2021.106294
Haoyuan, S., Yizhong, M., Chenglong, L., Jian, Z., Lijun, L. (2023). Hierarchical Bayesian support vector regression with model parameter calibration for reliability modeling and prediction. Reliability Engineering and System Safety, 229, 108842. doi: 10.1016/j.ress.2022.108842
Hu, X., Shi, L., Lian, X., Bian, J. (2023). Parameter variability across different timescales in the energy balance-based model and its effect on evapotranspiration estimation. Science of the Total Environment, 871, 161919. doi: 10.1016/j.scitotenv.2023.161919.
Jiang, G. J., Chen, H. X., Sun, H. H., Yazdi, M., Nedjati, A., Adesina, K. A. (2021). An improved multi-criteria emergency decision-making method in environmental disasters. Soft Computing, 25(15), 10351-10379. doi: 10.1007/s00500-021-05826-x
Kim, H. J., Chandrasekara, S., Kwon, H. H., Lima, C., Kim, T. W. (2023). A novel multi-scale parameter estimation approach to the Hargreaves-SamaniEq. for estimation of Penman-Monteith reference evapotranspiration. Agricultural Water Management, 275, 108038. doi: 10.1016/j.agwat.2022.108038
Malek, M. H., Berger, D. E., Coburn, J. W. (2007). On the inappropriateness of stepwise regression analysis for model building and testing. European Journal of Applied Physiology, 101, 263-264. doi: 10.1007/s00421-007-0485-9
Maroufpoor, S., Bozorg-Haddad, O., Maroufpoor, E. (2020). Reference evapotranspiration estimating based on optimal input combination and hybrid artificial intelligent model: Hybridization of artificial neural network with grey wolf optimizer algorithm. Journal of Hydrology, 588, 125060. doi: 10.1016/j.jhydrol.2020.125060
Musbah, H., Ali, G., Aly, H.H., Little, T. A. (2022). Energy management using multi-criteria decision making and machine learning classification algorithms for intelligent system. Electric Power Systems Research, 203, 107645. doi: 10.1016/j.epsr.2021.107645
Nieminen, P. (2022). Application of standardized regression coefficient in meta-analysis. BioMedInformatics, 2(3), 434-458. doi:10.3390/biomedinformatics2030028
Raffinetti, E., Aimar, F. (2019). MDCgo takes up the association/correlation challenge for grouped ordinal data. AStA Advances in Statistical Analysis, 103(4), 527-561. doi: 10.1007/s10182-018-00341-1
Rezaei, I., Amirshahi, S. H., Mahbadi, A. A. (2023). Utilizing support vector and kernel ridge regression methods in spectral reconstruction. Results in Optics, 11, 100405. doi: 10.1016/j.rio.2023.100405
Saroughi, M., Mirzania, E., Achite, M., Katipoğlu, O. M., Al-Ansari, N., Vishwakarma, D. K., Chung, I. M., Alreshidi, M. A., Yadav, K. K. (2024). Evaluate effect of 126 pre-processing methods on various artificial intelligence models accuracy versus normal mode to predict groundwater level (case study: Hamedan-Bahar Plain, Iran). Heliyon, 10(7). doi: 10.1016/j.heliyon.2024.e29006
Shternshis, A., Mazzarisi, P., Marmi, S. (2022). Measuring market efficiency: The Shannon entropy of high-frequency financial time series. Chaos, Solitons & Fractals, 162, 112403. doi: 10.1016/j.chaos.2022.112403
Shu, Z., Zhou, Y., Zhang, J., Jin, J., Wang, L., Cui, N., Wang, G., Zhang, J., Wu, H., Wu, Z., Chen, X. (2022). Parameter regionalization based on machine learning optimizes the estimation of reference evapotranspiration in data deficient area. Science of the Total Environment, 844, 157034. doi: 10.1016/j.scitotenv.2022.157034
Smith, G. (2018). Step away from stepwise. Journal of Big Data, 5(32), 1-12. doi: 10.1186/s40537-018-0143-6
Su, Q., Singh, V. P., Karthikeyan, R. (2022). Improved reference evapotranspiration methods for regional irrigation water demand estimation. Agricultural Water Management, 274, 107979. doi: 10.1016/j.agwat.2022.107979
Tabar, H., Hosseinzadeh Talaee, P. (2013). Multilayer perceptron for reference evapotranspiration estimation in a semiarid region. Neural Computing & Applications, 23, 341-348. doi: 10.1007/s00521-012-0904-7
Yadeta, D., Kebede, A., Tessema, N. (2020). Potential evapotranspiration models evaluation, modelling, and projection under climate scenarios, Kesem sub-basin, Awash River basin, Ethiopia. Modeling Earth System and Environment, 6, 2165-2176. doi: 10.1007/s40808-020-00831-9
Yao, Y., Mallik, A. U. (2022). Estimation of actual evapotranspiration and water stress in the Lijiang River Basin, China using a modified Operational Simplified Surface Energy Balance (SSEBop) model. Journal of Hydro-environment Research, 41, 1-11. doi: 10.1016/j.jher.2022.01.003
Zhu, N., Wang, J., Luo, D. (2024). Unveiling evapotranspiration patterns and energy balance in a subalpine forest of the Qinghai–Tibet Plateau: observations and analysis from an eddy covariance system. Journal of Forestry Research, 35, 53. doi: 10.1007/s11676-024-01708-8
 
Volume 5, Special Issue (S1)
Climate Change and Effects on Water and Soil
2025
Pages 252-269
  • Receive Date: 06 September 2025
  • Revise Date: 23 September 2025
  • Accept Date: 03 October 2025