Document Type : Research/Original/Regular Article
Authors
1
Department of Irrigation and Reclamation Engineering, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
2
Department of Geography & Environmental Sustainability, University of Oklahoma, Norman, OK, USA
3
Department of Civil Engineering, University of Texas at Arlington, Arlington, TX, USA
4
Department of Civil Engineering, College of Engineering, University of Birjand, Birjand, Iran
Abstract
Accurate estimation of reference evapotranspiration (ETo) is indispensable for precision irrigation and sustainable water resource management, yet the lack of physical interpretability in advanced machine learning models limits their operational adoption. This study proposes a systematic framework integrating the state-of-the-art categorical boosting (CatBoost) algorithm, Bayesian hyperparameter optimization, and SHapley Additive exPlanations (SHAP) to predict daily ETo across three contrasting climatic classifications in Iran: arid, semi-arid, and humid. By benchmarking CatBoost against extreme gradient boosting (XGBoost) and Random Forest under various sensor-availability scenarios, we demonstrated the superior robustness and generalization capability of the gradient boosting framework (CatBoost achieved R2 > 0.99 and RMSE ranging from 0.06 to 0.13 mm/day across all climates), particularly in capturing peak evaporative demands. Beyond mere prediction, the integration of explainable AI revealed a distinct climatic divergence in hydrological drivers; while aerodynamic forces, specifically wind speed, act as the primary accelerator of ETo in arid environments, the process is predominantly energy-limited and driven by temperature and solar radiation in humid regions. Furthermore, the study identified critical non-linear environmental thresholds that trigger rapid escalations in water demand, a dynamic often missed by linear empirical equations. Uncertainty analysis using Quantile Regression further confirmed the model's reliability in handling stochastic climatic extremes (achieving a Prediction Interval Coverage Probability of 88.1-91.4% and narrow interval widths). Practically, our findings offer a cost-effective roadmap for agricultural monitoring, suggesting that while low-cost, temperature-based sensors suffice for humid and semi-arid regions, the inclusion of aerodynamic sensors is non-negotiable for accurate irrigation scheduling in arid zones. This research contributes to bridging the gap between predictive accuracy and physical interpretability, offering a methodological blueprint for optimizing hydro-meteorological networks in data-scarce regions.
Keywords
Main Subjects