Modeling soil water repellency in loess soils of northern Iran using machine learning

Document Type : Research/Original/Regular Article

Authors

1 Associate Professor, Department of Arid Land Management, Gorgan University of Agricultural Sciences and Natural Resources, Golestan, Iran

2 Assistant Research Professor, Soil and Water Conservation Research Department, Khuzestan Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Ahvaz, Iran

3 Associate Professor, Department of Desert Zone Management Faculty of Rangeland and Watershed Management, Gorgan University of Agricultural Sciences and Natural Sciences, Golestan, Iran

Abstract

Extended Abstract

Introduction

A major hydrological and physical event affecting surface runoff, erosion, and water infiltration is soil water repellency (SWR). Hydrophobic soils reject wetting, therefore causing water droplets to linger on the surface instead of penetrating the soil profile. Particularly in sloping environments and arid ecosystems, this disease causes more overland flow, less water retention, and higher sensitivity to soil loss. SWR is progressively seen in northern Iran, especially in the loess-derived soils of Golestan and Mazandaran provinces, as a result of a mix of climatic conditions and changes in land use. Fine silty to silty-clay textures characterize loess soils in these areas; together with environmental stressors including drought cycles, agricultural development, and deforestation, they aid in the formation and magnification of SWR. Central in the water repellency are organic compounds, especially hydrophobic plant-derived substances like waxes and lignins. Emphasizing the significance of soil chemical composition, many studies have shown that soil organic carbon is strongly positively linked to SWR intensity. Variations in clay content, pH, and electrical conductivity (EC) can also affect SWR patterns. Although SWR is very important in soil degradation processes, little research has been done employing sophisticated data-driven techniques to forecast its spatial variability. Machine learning (ML) algorithms like Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) provide strong means for modeling complicated soil behavior. Using these algorithms, the current study attempts to forecast SWR in loess soils based on a thorough set of physicochemical parameters, therefore helping to justify improved soil management and erosion control measures.

Materials and Methods

Northern Iran served as the location for the study, which looked at specific loess terrains in Golestan and Mazandaran provinces. From many sites including Gorgan, Maraveh Tappeh, Neka, Sari, and Amol, 45 surface soil samples (depth 0–10 cm) were gathered. While minimizing confounding effects, sampling places were chosen to record changes in topography, land use, and vegetation. Key soil physicochemical characteristics assessed were organic carbon (OC), organic matter (OM), electrical conductivity (EC), pH, mean weight diameter (MWD) of soil aggregates, and particle size distribution (sand, silt, clay). With infiltration time recorded up to 3000 seconds, WDPT tests involved dropping 50 μL distilled water droplets on air-dried soil surfaces at room temperature. WDPT values were used as the target variable in model development. Procedures used in laboratories complied with the same criteria used in earlier research. R software was used for data preprocessing. Outlier detection based on interquartile range (IQR), Z-score normalization of numerical variables, and multicollinearity analysis utilizing the Variance Inflation Factor (VIF) were included in this. Categorical variables like soil texture classes were converted to dummy variables using one-hot encoding. Three machine learning approaches—Decision Tree (CART approach), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—were applied to the dataset, which was randomly separated into 70% training and 30% testing parts. Models were implemented using R packages rpart, randomForest, and xgboost. Through repeating 10-fold cross-validation, hyperparameter tuning was carried out to enhance prediction accuracy.

Results and Discussion

Initial model performance using default settings revealed limited predictive ability across all algorithms. The Decision Tree (DT) model yielded the weakest results with RMSE = 19.55 and R² = 0.02, indicating poor capacity to capture the variability in WDPT values. After hyperparameter optimization, both Random Forest (RF) and XGBoost (XGB) showed significant improvements. The RF model achieved RMSE = 15 and R² = 0.42, while XGB recorded RMSE = 14.7 with the same R², highlighting their comparable predictive power. Feature importance analysis revealed that organic carbon was the most influential predictor of WDPT across all models. Additional influential variables included clay content, sand fraction, EC, OM, and pH, though their relative importance varied by algorithm. In RF, organic matter and sand had high predictive value, whereas in XGB, clay and EC gained prominence. These differences reflect each model's inherent sensitivity to nonlinear interactions. Spatial analysis showed that areas with higher organic carbon content aligned with regions of higher WDPT, confirming the key role of hydrophobic organic compounds in driving soil water repellency. Uncertainty assessment using Bootstrap and Monte Carlo simulations demonstrated that RF was the most stable model, showing the lowest RMSE variability and higher resilience to noisy input data. Overall, the results confirm that machine learning algorithms, especially RF and XGB, can effectively model and interpret the complex interactions influencing soil water repellency in loess landscapes.

Conclusion

This study demonstrated the applicability of advanced machine learning algorithms for modeling soil water repellency (SWR) in loess-derived soils of northern Iran. Among the three tested models, Random Forest (RF) provided the most reliable and stable predictions, with optimal performance metrics (RMSE = 15, R² = 0.42) and low sensitivity to data uncertainty. XGBoost (XGB) also yielded competitive results but showed slightly lower stability under uncertain conditions. The Decision Tree (DT) model, while interpretable, lacked sufficient predictive accuracy for complex, nonlinear relationships. The results confirmed that organic carbon is the dominant driver of SWR in the study area, supporting previous findings regarding the hydrophobic nature of plant-derived organic compounds. Other variables such as clay content, sand fraction, pH, and EC also played important roles depending on the model structure. Differences in variable importance highlight the benefit of using multiple algorithms to obtain a comprehensive understanding of the underlying mechanisms. Uncertainty analysis showed that RF is less susceptible to overfitting and data noise, making it a more robust choice for environmental modeling. Spatial patterns of WDPT and key soil variables revealed strong regional correlations, suggesting the feasibility of using geospatial ML models for site-specific soil management. Future research should explore hybrid models (e.g., RF-XGB) and deep learning architectures (e.g., CNNs or ActionFormer) to enhance predictive power, particularly in dynamic or post-disturbance soil systems. Moreover, integrating multi-temporal datasets could improve the understanding of SWR variability under different environmental and management conditions.

Keywords

Main Subjects


منابع
امامی، مریم، خرمالی، فرهاد، پهلوان راد، محمدرضا و ابراهیمی، سهیلا (1403). تهیۀ نقشه‌های سه‌بعدی اجزای بافت خاک با تلفیق الگوریتم جنگل رگرسیونی چندکی و تابع عمق اسپیلاین در استان گلستان. تحقیقات آب و خاک ایران، 55(1) ، 51-68. doi: 10.22059/ijswr.2023.366978.669594
پهلوان راد، محمدرضا، تومانیان، نورایر و خرمالی، فرهاد (1395). معرفی نقشه‌برداری رقومی خاک. مدیریت اراضی 4(2), 114-97. doi: 10.22092/lmj.2017.109482
حیدری، کهزاد، نجفی نژاد، علی، محمدیان بهبهانی، علی و اونق، مجید (1397). بررسی شدت آب‌گریزی خاک و تغییرات زمانی آن پس از آتش سوزی تجویزی در مناطق جنگلی آبخیز توشن استان گلستان. پژوهش‌های حفاظت آب و خاک 25(4), 47-27.doi: 10.22069/jwsc.2018.14663.2960
 
 
References
Baghbani, A., Kiany, K., Abuel-Naga, H., & Lu, Y. (2025). Predicting the Compression Index of Clayey Soils Using a Hybrid Genetic Programming and XGBoost Model. Applied Sciences, Vol. 15, Page 1926, 15(4), 1926. doi: 10.3390/APP15041926
Blaesbjerg, N. H., Weber, P. L., de Jonge, L. W., Moldrup, P., Greve, M. H., Arthur, E., Knadel, M., & Hermansen, C. (2022). Water repellency prediction in high-organic Greenlandic soils: Comparing vis–NIRS to pedotransfer functions. Soil Science Society of America Journal, 86(3), 643–657. doi: 10.1002/SAJ2.20407
Bouajila, A., & Gallali, T. (2010). Land use effect on soil and particulate organic carbon, and aggregate stability in some soils in Tunisia. African Journal of Agricultural Research, 5(8), 764–774. doi: 10.5897/AJAR10.183
Brungard, C. W., Boettinger, J. L., Duniway, M. C., Wills, S. A., & Edwards, T. C. (2015). Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma, 239–240, 68–83. doi: 10.1016/J.GEODERMA.2014.09.019
Chen, J., McGuire, K. J., & Stewart, R. D. (2020). Effect of soil water-repellent layer depth on post-wildfire hydrological processes. Hydrological Processes, 34(2), 270–283. doi:org/10.1002/HYP.13583
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 785–794. doi: 10.1145/2939672.2939785
Chenu, C., Bissonnais, Y. Le, & Arrouays, D. (2000). Organic Matter Influence on Clay Wettability and Soil Aggregate Stability. Soil Science Society of America Journal, 64(4), 1479–1486. doi: 10.2136/SSSAJ2000.6441479X
Cutler, D. R., Edwards, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783–2792. doi: 10.1890/07-0539.
Danielsen, A. C. S., Hermansen, C., Weber, P. L., Mikstas, D., Pesch, C., de Carvalho Gomes, L., Gutierrez, S., Nielsen, P. H., Greve, M. H., Møldrup, P., Normand, S., & de Jonge, L. W. (2025). Soil Water Repellency in Natural and Semi-Natural Habitats: A Nexus Between Abiotic Factors and Prokaryotic Communities. European Journal of Soil Science, 76(2), e70063. doi: 10.1111/EJSS.70063
de Blas, E., Almendros, G., & Sanz, J. (2013). Molecular characterization of lipid fractions from extremely water-repellent pine and eucalyptus forest soils. Geoderma, 206, 75–84. doi: 10.1016/j.geoderma.2013.04.027
Doerr, S. H., & Shakesby, R. A. (2011). Handbook of Soil Sciences Properties and Processes, second edition. In Handbook of Soil Sciences Properties and Processes, second edition (pp. 515–525). CRC press, Taylor and Francis group. doi: 10.1016/B978-0-444-51269-7.50023-0
Doerr, S. H., & Thomas, A. D. (2000). The role of soil moisture in controlling water repellency: new evidence from forest soils in Portugal. Journal of Hydrology, 231–232, 134–147. doi: 10.1016/S0022-1694(00)00190-6
Ellerbrock, R. H., Gerke, H. H., Bachmann, J., & Goebel, M.-O. (2005). Composition of Organic Matter Fractions for Explaining Wettability of Three Forest Soils. Soil Science Society of America Journal, 69(1), 57. doi: 10.2136/SSSAJ2005.0057
Emadi, M., Taghizadeh-Mehrjardi, R., Cherati, A., Danesh, M., Mosavi, A., & Scholten, T. (2020). Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sensing, 12(14). doi: 10.3390/rs12142234
Emami, M., Khormali, F., Pahlavan Rad, M. reza, & Ebrahimi, S. (2024). Preparation of three-dimensional maps of soil particle size fractions by combining quantile regression forest algorithm and spline depth function in Golestan Province. Iranian Journal of Soil and Water Research, 55(1), 51–68. doi: 10.22059/ijswr.2023.366978.669594 [In Persian]
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. 29(5), 1189–1232. doi: 10.1214/AOS/1013203451
García, E. M., Alberti, M. G., & Arcos Álvarez, A. A. (2022). Measurement-While-Drilling Based Estimation of Dynamic Penetrometer Values Using Decision Trees and Random Forests. Applied Sciences 2022, 12(9), 4565. doi: 10.3390/APP12094565
Hallett, P. D., Bachmann, J., Czachor, H., Urbanek, E., & Zhang, B. (2011). Hydrophobicity of Soil. Encyclopedia of Earth Sciences Series, Part 4, 378–384. doi: 10.1007/978-90-481-3585-1_195
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. doi: 10.1007/978-0-387-84858-7
Heidary, K., Najafi Nejad, A., Dekker, L. W., Ownegh, M., & Mohammadian Behbahani, A. (2018). Impact of Soil Water Repellency on Hydrological and Erosion Processes; A Review. ECOPERSIA, 6(4), 269–284. doi: 20.1001.1.23222700.2018.6.4.6.7
Heidary, K., Najafinejad, A., Mohammadian Behbahani, A., & Ownegh, M. (2018a). Assessment of Soil Water Repellency Intensity and Its Temporal Variability after Prescribed Fire in Forest Areas of Toshen Watershed, Golestan Province. Journal of Water and Soil Conservation, 25(4), 27–47. doi: 10.22069/jwsc.2018.14663.2960 [In Persian]
Heidary, K., Najafinejad, A., Mohammadian Behbahani, A., & Ownegh, M. (2018b). Assessment of Soil Water Repellency Intensity and Its Temporal Variability after Prescribed Fire in Forest Areas of Toshen Watershed, Golestan Province. Journal of Water and Soil Conservation, 25(4), 27–47. doi: 10.22069/jwsc.2018.14663.2960
Hermansen, C., Norgaard, T., de Jonge, L. W., Weber, P. L., Moldrup, P., Greve, M. H., Tuller, M., & Arthur, E. (2021). Linking water vapor sorption to water repellency in soils with high organic carbon contents. Soil Science Society of America Journal, 85(4), 1037–1049. doi: 10.1002/saj2.20248
Kariminejad, N., Hosseinalizadeh, M., & Pourghasemi, H. R. (2022). Digital soil mapping of soil bulk density in loess derived-soils with complex topography. Computers in Earth and Environmental Sciences: Artificial Intelligence and Advanced Technologies in Hazards and Risk Management, 593–599. doi: 10.1016/B978-0-323-89861-4.00018-X
Kavian, A., Azmoodeh, A., & Solaimani, K. (2014). Deforestation effects on soil properties, runoff and erosion in northern Iran. Arabian Journal of Geosciences, 7(5), 1941–1950. doi: 10.1007/S12517-013-0853-1
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28(5), 1–26. doi: 10.18637/jss.v028.i05
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Applied Predictive Modeling, 1–600. doi: 10.1007/978-1-4614-6849-3/COVER
Li, Y., & Liu, D. (2024). Effects of under-forest economic activities on soil water repellency, soil hydraulic properties and preferential flow in karst forests. Ecohydrology and Hydrobiology, 24(1), 128–153. doi: 10.1016/j.ecohyd.2023.12.006
Lipton, Z. C. (2016). The Mythos of Model Interpretability. Communications of the ACM, 61(10), 35–43. doi: 10.1145/3233231
Lombardo, L., Saia, S., Schillaci, C., Mai, P. M., & Huser, R. (2017). Modeling soil organic carbon with Quantile Regression: Dissecting predictors’ effects on carbon stocks. Geoderma, 318, 148–159. doi: 10.1016/j.geoderma.2017.12.011
Maleki, S., Khormali, F., Chen, S., Pourghasemi, H. R., & Hosseinalizadeh, M. (2022). Digital soil mapping of organic carbon at two depths in loess hilly region of Northern Iran. Computers in Earth and Environmental Sciences: Artificial Intelligence and Advanced Technologies in Hazards and Risk Management, 467–475. doi: 10.1016/B978-0-323-89861-4.00033-6
Mao, J., Li, Y., Zhang, J., Zhang, K., Ma, X., Wang, G., & Fan, L. (2022a). Organic carbon and silt determining subcritical water repellency and field capacity of soils in arid and semi-arid region. Frontiers in Environmental Science, 10, 1031237. doi: 10.3389/FENVS.2022.1031237/BIBTEX
Mao, J., Li, Y., Zhang, J., Zhang, K., Ma, X., Wang, G., & Fan, L. (2022b). Organic carbon and silt determining subcritical water repellency and field capacity of soils in arid and semi-arid region. Frontiers in Environmental Science, 10, 1031237. doi: 10.3389/FENVS.2022.1031237/BIBTEX
Mataix-Solera, J., & Doerr, S. (2004). Hydrophobicity and aggregate stability in calcareous topsoils from fire-affected pine forests in southeastern Spain. Geoderma.
Mehta, V., Hasanvand, S., Sepahvand, A., Sihag, P., Beiranvand, N., & Singh, B. (2024). A benchmark comparison of AI-based modeling of soil infiltration rates. Journal of Hydroinformatics, 26(12), 3060–3079. doi: 10.2166/hydro.2024.086
Ng, W., Minasny, B., Montazerolghaem, M., Padarian, J., Ferguson, R., Bailey, S., & McBratney, A. B. (2019). Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra. Geoderma, 352, 251–267. doi: 10.1016/j.geoderma.2019.06.016
Padarian, J., Minasny, B., & McBratney, A. B. (2019). Using deep learning to predict soil properties from regional spectral data. Geoderma Regional, 16. doi: 10.1016/J.GEODRS.2018.E00198
Pahlavanrad, M., Toomanian, N., & Khormali, F. (2017). Digital soil mapping. Journal of land Managment, 4(2), 97–114. doi: 10.22092/lmj.2017.109482 [In Persian]
Raheem, A. M., & Omar, N. Q. (2021). Investigation of distinctive physico-chemical soil correlations for Kirkuk city using spatial analysis technique incorporated with statistical modeling. International Journal of Geo-Engineering, 12(1), 1–21. doi: 10.1186/S40703-021-00147-2/FIGURES/9
Sepahvand, A., Golkarian, A., Billa, L., Wang, K., Rezaie, F., Panahi, S., Samadianfard, S., & Khosravi, K. (2022). Evaluation of deep machine learning-based models of soil cumulative infiltration. Earth Science Informatics 2022 15:3, 15(3), 1861–1877. doi: 10.1007/S12145-022-00830-7
Sepehrnia, N., Hajabbasi, M. A., Afyuni, M., & Lichner, L. (2017). Soil water repellency changes with depth and relationship to physical properties within wettable and repellent soil profiles. Journal of Hydrology and Hydromechanics, 65(1), 99–104. doi: 10.1515/JOHH-2016-0055
Shakesby, R. A., Doerr, S. H., & Walsh, R. P. D. (2000). The erosional impact of soil hydrophobicity: current problems and future research directions. Journal of Hydrology, 231–232, 178–191. doi: 10.1016/S0022-1694(00)00193-1
Tarek, Z., Elshewey, A. M., Shohieb, S. M., Elhady, A. M., El-Attar, N. E., Elseuofi, S., & Shams, M. Y. (2023). Soil Erosion Status Prediction Using a Novel Random Forest Model Optimized by Random Search Method. Sustainability 2023, Vol. 15, Page 7114, 15(9), 7114. doi: 10.3390/SU15097114
Wadoux, A. M. J. C., Saby, N. P. A., & Martin, M. P. (2023). Shapley values reveal the drivers of soil organic carbon stock prediction. SOIL, 9(1), 21–38. doi: 10.5194/SOIL-9-21-2023
Wang, D., Regentova, E., Muthukumar, V., Berli, M., & Harris, F. C. (2024). A machine learning framework to measure Water Drop Penetration Time (WDPT) for soil water repellency analysis. Machine Learning with Applications, 18, 100595. doi: 10.1016/J.MLWA.2024.100595
Wang, J., Wang, W., Ren, X., Wu, Q., Chai, X., Qu, Y., Xu, X., & Du, F. (2025). Aliphatic carbon regulates soil water repellency in a chronosequence of grassland enclosure in the Loess Hilly Region. Soil and Tillage Research, 246, 106356. doi: 10.1016/J.STILL.2024.106356
Wang, T., Wedin, D., & Zlotnik, V. A. (2009). Field evidence of a negative correlation between saturated hydraulic conductivity and soil carbon in a sandy soil. Water Resources Research, 45(7). doi: 10.1029/2008wr006865
Weber, P. L., Hermansen, C., Norgaard, T., Pesch, C., Moldrup, P., Greve, M. H., Müller, K., Arthur, E., & de Jonge, L. W. (2021). Moisture-dependent Water Repellency of Greenlandic Cultivated Soils. Geoderma, 402. doi: 10.1016/J.GEODERMA.2021.115189
Yang, Y., & Mei, G. (2022). A Deep Learning-Based Approach for a Numerical Investigation of Soil–Water Vertical Infiltration with Physics-Informed Neural Networks. Mathematics 2022, Vol. 10, Page 2945, 10(16), 2945. doi: 10.3390/MATH10162945
Zhang, Q., Yang, J., & Kong, Y. (2025). Effects of tung oil and its induced hydrophobicity on the cracking behavior of purple soil exposed to wetting–drying cycle conditions. CATENA, 256, 109112. doi: 10.1016/J.CATENA.2025.109112
Zornoza, R., Guerrero, C., Mataix-Solera, J., Scow, K. M., Arcenegui, V., & Mataix-Beneyto, J. (2008). Near infrared spectroscopy for determination of various physical, chemical and biochemical properties in Mediterranean soils. Soil Biology and Biochemistry, 40(7), 1923–1930. doi: 10.1016/J.SOILBIO.2008.04.003