Application and Comparison of Missing Groundwater Level Data Interpolation Methods with an Emphasis on DeepMVI Performance (Case Study: Ajabshir Plain)

Document Type : Research/Original/Regular Article

Authors

1 MSc of Water Recourse Engineering, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

2 Assistant Professor, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

3 Associate Professor, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Abstract

Groundwater is a vital water resource, especially in arid and semi-arid regions such as northwestern Iran. It plays a crucial role in agriculture, drinking water supply, and industrial activities. Therefore, reliable monitoring and management of groundwater levels are essential for sustainable development. However, missing data in groundwater level time series caused by factors like equipment failure, inaccessible terrain, or extreme weather can hinder accurate analysis and prediction. To address this, interpolation techniques are used to estimate missing values based on observed data. The reliability of these techniques depends on the quantity, spatial distribution, and temporal resolution of the available data. In recent years, machine learning and deep learning methods have shown promise in handling complex, nonlinear, and high-dimensional datasets. This study evaluates the effectiveness of five interpolation methods Kriging, Inverse Distance Weighting (IDW), Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), Random Forest Spatial Interpolation (RFSI), and Deep Missing Value Imputation (DeepMVI) to reconstruct missing groundwater level data. The focus is on improving data completeness and accuracy for subsequent groundwater analyses. The case study is the Ajabshir aquifer, where long-term data from 29 piezometric wells are used. The objective is to compare the performance of traditional and modern interpolation approaches and to determine the most accurate method for handling missing groundwater level data.

Materials and Methods

In this study, groundwater level data from 29 piezometric wells in the Ajabshir aquifer in northwest Iran were analyzed monthly over a 17-year period (2006–2022). Due to various operational and environmental constraints, numerous gaps were observed in the dataset.

To estimate the missing values, five interpolation methods were evaluated: Kriging, Inverse Distance Weighting (IDW), Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), Random Forest Spatial Interpolation (RFSI), and Deep Missing Value Imputation (DeepMVI). Kriging uses semivariograms to model spatial dependence and provides statistically unbiased estimates. IDW is a deterministic technique based on the inverse distance to known values. PCHIP maintains the monotonicity and continuity of time-series data. RFSI applies the Random Forest algorithm to capture nonlinear spatial relationships, and DeepMVI utilizes deep learning to model complex temporal and multivariate dependencies in the data.

The dataset was randomly divided into training (70%) and testing (30%) subsets. The performance of each method was assessed using the correlation coefficient (R), root mean square error (RMSE), and Nash–Sutcliffe efficiency (NSE).

Results and Discussion

The evaluation results show significant variation in model accuracy. The Kriging method, while widely used, showed poor performance in this study due to the sparse and irregular distribution of observation wells. Its results included a low correlation (R = 0.37), high RMSE (417.91), and low NSE (0.11), indicating that this method is not suitable under conditions with extensive missing data and limited spatial continuity. The IDW method improved over Kriging but still yielded moderate accuracy (R = 0.56, RMSE = 365.51, NSE = 0.30).

The PCHIP method performed considerably better, reflecting its ability to handle temporal data smoothly. It achieved R = 0.89, RMSE = 7.52, and NSE = 0.72, making it the second most accurate method. The method preserved the shape of the original groundwater level trends and was effective in reconstructing long sequences of missing data. The RFSI method, which leverages machine learning, showed better accuracy than Kriging and IDW (R = 0.63, RMSE = 11.06, NSE = 0.40), although it was outperformed by PCHIP and DeepMVI. This suggests that while machine learning can improve performance, spatial interpolation with sparse data remains challenging. The DeepMVI method outperformed all other methods, achieving the highest correlation (R = 0.92), lowest RMSE (6.44), and highest NSE (0.80). Its ability to capture both spatial and temporal relationships using a hybrid deep neural architecture makes it highly effective in imputing missing groundwater data, especially when the dataset includes complex time-dependent patterns and multivariate interactions.

The final comparison of time series plots across 29 piezometric wells also visually confirmed the accuracy of the DeepMVI model in maintaining original trends and minimizing noise or abrupt changes. These results demonstrate that deep learning models offer a promising approach for improving the quality and reliability of groundwater monitoring datasets.

Conclusion

This research evaluated the performance of five interpolation methods for reconstructing missing groundwater level data from 29 piezometric wells in the Ajabshir aquifer over a 17-year period. Among the methods tested, DeepMVI outperformed all others, providing the most accurate and reliable results. Its ability to model complex temporal and spatial dependencies makes it particularly suitable for environmental datasets with high variability and missing values. PCHIP and RFSI also performed well and could serve as viable alternatives when deep learning infrastructure is not available. Although Kriging and IDW are widely used in hydrogeological studies, their lower performance in this study suggests that their application may be limited under conditions of sparse or irregular data. The study highlights the importance of selecting appropriate interpolation methods based on data characteristics. DeepMVI, with its robust architecture, holds significant promise for future groundwater studies and can enhance the quality of groundwater monitoring systems by providing more complete and accurate datasets. This, in turn, can improve water resource management and planning in regions facing water scarcity and environmental stress.

Keywords

Main Subjects


منابع
آذره، ع. رفیعی ساردویی، ا. نظری سامانی، ع ا. مسعودی، ر. و خسروی، ح. (1393). بررسی تغییرات مکانی و زمانی سطح آبهای زیرزمینی در دشت گرمسار. نشریه مدیرت بیابان، 3: 20-11. doi: 10.22034/jdmal.2014.17058
باباعلی، ح. و دهقانی،ر. (1396). مقایسه مدلهای شبکه عصبی موجک و شبکه عصبی مصنوعی در پیش‌بینی سطح آب زیرزمینی. هیدروژئولوژی،  2 (2): 108-96.  doi:10.22034/hydro.2018.5572
جانی، ر. (1397). اولویت سنجی روش های درون یابی فضایی در پهنه بندی مقاومت خاک (مطالعه موردی: شهرک پرواز). فضای جغرافیایی18(61), 125-140.‎
رضایی‌بنفشه، م.، جلالی عنصرودی، ط. حسنپور اقدم بگلو، م. ع .(1397). تحلیل و مدلسازی تغییرات سطح آب زیرزمینی حوضه آبریز تسوج با استفاده از فرآیند اتو رگرسیو میانگین متحرک. فصلنامه فضای جغرافیایی، 7 (57): 287-273.
دهقانی، ا. عسگری، م. و مساعدی، ا.  (1388). مقایسه سه روش عصبی مصنوعی، سیستم استنتاج فازی-عصبی تطبیقی و زمین آمار در میانیابی سطح آبهای زیرزمینی (مطالعه موردی: دشت قزوین.) مجله علوم کشاورزی و منابع طبیعی، 16(1): 528-517.
عبدی، ع. اسدی، ا. قربانی، م.ع. (1403). مقایسه روشهای درون‌یابی به منظور بهبود پیش‌بینی سطح ایستابی آب زیرزمینی با استفاده از روش‌های یادگیری عمیق. مدیریت آب و آبیاری. doi: 10.22059/JWIM.2024.372424.1145
فرامرزپور، م.، صارمی، ا.، خسروجردی، ا. و بابازاده، ح. (1402). ارزیابی مدل‌های یادگیری ماشین در پیش‌بینی شاخص‌های خشکسالی (مطالعه موردی: منطقه عجب شیر). مجله اکوهیدرولوژی، 10(3)، 405-419.
 doi: 10.22059/ije.2023.364229.1754
میثاقی، ف. و محمدی، ک. (1387). بررسی سطح آب زیرزمینی با استفاده از روشهای متداول درونیابی و مقایسه آن با تکنیک های زمین آمار. سومین گردهمایی علوم زمین، سازمان زمین شناسی ایران، ص5.
نکوآمال کرمانی، م. و میرعباسی نجف آبادی، ر. (1395). ارزیابی روش‌های درون‌یابی در تخمین سطح آب زیرزمینی (مطالعه موردی: دشت سرخون). هیدروژئولوژی، 2(2): 84-95. doi:10.22034/hydro.2018.5662
 
 
References
Abdi, E. Asadi, E. Ghorbani, M.A. (2024). Comparison of interpolation methods to improve groundwater table prediction using deep learning methods. Water and Irrigation Management. doi.org/ 10.22059/JWIM.2024.372424.1145 (In Persian)
Altınok, H., Bursalı, A., Açıksöz, S., & Erkuş, E. C. (2023, August). Dissimilarity Metric Score Estimation for Time Series with Missing Values. In Proceedings of the International Conference on Advanced Technologies 11: 207-210.
Azareh, A. Rafiei Sardoii, E. Nazari Samani, A. Masoudi, R. and Khosravi, H. (2014). Study on Spatial and Temporal Variations of Groundwater Level in Garmsar Plain. Journal of Desert Management 3: 11- 20. doi:10.22034/jdmal.2014.17058 (In Persian)
Babaali, H. and Dehghani, R. (2017). Comparison of wavelet neural network and artificial neural network models in groundwater level prediction. Hydrogeology, 2 (2): 108-96. doi:10.22034/hydro.2018.5572 (In Persian)
Bajjali, W. (2023). Spatial Interpolation. In ArcGIS Pro and ArcGIS Online: Applications in Water and Environmental Sciences. Cham: Springer International Publishing: 223-242. doi:10.1007/978-3-031-42227-0_11
Balasbas III, S., & Sundmacher, K. (2024). Kinetics of Synthetic Multi-Enzyme Reaction Networks: Dynamic Flux Estimation by use of Piecewise Cubic Hermite Interpolating Polynomials (PCHIP). In International Symposium on Chemical Reaction Engineering 2024: ISCRE 28.
Bansal, P., Deshpande, P., and Sarawagi, S. (2021). Missing Value Imputation on Multidimensional Time Series. PVLDB 14(1): 2150-8097. doi: 10.48550/arXiv.2103.01600
Barker, P. M., & McDougall, T. J. (2020). Two interpolation methods using multiply-rotated piecewise cubic hermite interpolating polynomials. Journal of Atmospheric and Oceanic Technology37(4): 605-619. doi: 10.1175/JTECH-D-19-0211.1
Breiman, L. (2001). Random forests. Machine learning45, 5-32. doi: 10.1023/A:1010933404324
Chen, C., Hu, B., & Li, Y. (2021). Easy-to-use spatial Random Forest-based downscaling-calibration method for producing high resolution and accurate precipitation data. Hydrology and Earth System Sciences Discussions2021, 1-50. doi: 10.5194/hess-25-5667-2021, 2021.
Chidepudi, S. R., et al. (2025). Deep learning model for groundwater level simulation using multi-station features. Hydrology and Earth System Sciences, 29, 841-860. doi: 10.5194/hess-29-841-2025Dawid, W., & Pokonieczny, K. (2020). Analysis of the possibilities of using different resolution digital elevation models in the study of microrelief on the example of terrain passability. Remote Sensing12(24), 4146. doi: 10.3390/rs12244146
Dehghani, A. Asgari, M. and Mosaedi, A. (2009). Comparison of three artificial neural methods, adaptive fuzzy-neural inference system and
 
geostatistics in groundwater level interpolation (Case study: Qazvin plain.) Journal of Agricultural Sciences and Natural Resources, 16(1): 517-528. (In Persian)
Dey, S., Dey, A. K., & Mall, R. K. (2021). Modeling long-term groundwater levels by exploring deep bidirectional long short-term memory using hydro-climatic data. Water Resources Management35, 3395-3410.  doi: 10.1007/s11269-021-02899-z
Dhaher, M. F. (2025). Co-kriging for groundwater balance estimation with limited data. Advances in Natural and Applied Sciences, 19(1), 54-65. https://internationalpubls.com/index.php/anvi/article/view/4536
Faramarzpour, M., Saremi, A., Khosrojerdi, A. and Babazadeh, H. (2023). Evaluation of machine learning models in predicting drought indicators (Case Study: Ajabshir area). Journal of Ecohydrology, 10(3), 405-419. doi: 10.22059/ije.2023.364229.1754 (In Persian)
Fortuin, V., Baranchuk, D., Rätsch, G., & Mandt, S. (2020, June). Gp-vae: Deep probabilistic time series imputation. In International conference on artificial intelligence and statistics (pp. 1651-1661). PMLR.
Fritsch, F. N., & Carlson, R. E. (1980). Monotone Piecewise Cubic Interpolation. SIAM Journal on Numerical Analysis, 17(2), 238–246. doi: 10.1137/0717021
García-Santos, G., Scheiber, M., & Pilz, J. (2020). Spatial interpolation methods to predict airborne pesticide drift deposits on soils using knapsack sprayers. Chemosphere258: 127231. doi: 10.1016/j.chemosphere.2020.127231
Gleeson, T., Wada, Y., Bierkens, M. F., & Van Beek, L. P. (2012). Water balance of global aquifers revealed by groundwater footprint. Nature488(7410): 197-200. doi: 10.1038/nature11295
He, X., Chaney, N. W., Schleiss, M., & Sheffield, J. (2016). Spatial downscaling of precipitation using adaptable random forests. Water resources research52(10): 8217-8237. doi: 10.1002/2016WR019034
Jani, R. (2018). Prioritizing spatial interpolation methods in soil resistance zoning (case study: Parvaz town). Geographical Space, 18(61), 125-140. (In Persian)
Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization 20 (4): 1956–1982. doi: 10.1137/080738970
Jiang, J., et al. (2025). LSTM-based deep learning model for daily groundwater table dynamics in seasonally frozen soils. EGUsphere. doi: 10.5194/egusphere-2025-1663
Khazaz, L., Oulidi, H. J., El Moutaki, S., & Ghafiri, A. (2015). Comparing and Evaluating Probabilistic and Deterministic Spatial Interpolation Methods for Groundwater Level of Haouz in Morocco. Journal of Geographic Information System, 7(06): 76051. dx.doi: 10.4236/jgis.2015.76051
Li, L., McCann, J., Pollard, N. S., & Faloutsos, C. (2009). Dynammo: Mining and summarization of coevolving sequences with missing values. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining :507-516. doi: 10.1145/1557019.1557078
Li, J., & Heap, A. D. (2014). Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software, 53, 173–189. doi: 10.1016/j.envsoft.2013.12.008
Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research11: 2287-2322.
McCuen, R. H., Knight, Z., & Cutter, A. G. (2006). Evaluation of the Nash–Sutcliffe efficiency index. Journal of hydrologic engineering, 11(6): 597-602. doi: 10.1061/(ASCE)1084-0699(2006)11:6(597)
Mehdian, M. (2006). The application of geostatistics in soil science, the training workshop on the application of geostatistics in soil science, the first conference on soil, sustainable development and environment, November 17- 18, University of Tehran, Tehran, Iran. (in Persian)
Mei, J., De Castro, Y., Goude, Y., & Hébrail, G. (2017). Nonnegative matrix factorization for time series recovery from a few temporal aggregates. In International conference on machine learning: 2382-2390 PMLR.
Misaghi, F. and Mohammadi, K. (2008). Investigation of groundwater level using conventional interpolation methods and its comparison with geostatistical techniques. Third Earth Sciences Conference, Geological Survey of Iran, p. 5. (In Persian)
Mohammadi, J. (2006). Pedometry, the second volume of spatial statistics, Palak Publishing House: 453. (in Persian).
Nadiri, A. and Asghari Moghadam, sh. Vediati, M. (2013). Evaluation of various interpolation methods to estimate nitrate pollution in underground water sources (case study: Bilourdi plain, East Azerbaijan province). Hydrogeomorphology, 1(1): 75-92. (in Persian).
Nag, P., et al. (2023). DeepKriging: A deep learning framework for spatial-temporal data imputation and prediction. arXiv preprint.
Nekoamal Kermani, M. and Mirabbasi Najafabadi, R. (2016). Evaluation of interpolation methods in groundwater level estimation (case study: Serkhon plain). Hydrogeology, 2(2): 84-95. (In Persian).
Nikbakht, S. Delbari, M. (2014). Estimation of groundwater levels using geostatistical methods. Journal of Water and Sustainable Development. 1(1): 49-56. doi: 10.22067/jwsd. v1i1.34599
Rezaei-Banafsheh, M., Jalali-Ansroudi, T. Hassanpour, Aghdam-Begloo, M. A. (2018). Analysis and modeling of groundwater level changes in Tasuj watershed using moving average autoregressive process. Quarterly Journal of Geographic Space, 7 (57): 273-287. (In Persian)
Sekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M., & Bajat, B. (2020). Random Forest Spatial Interpolation. Remote Sensing12(10), 1687. doi: 10.3390/rs12101687
Sun, Y., Kang, S., Li, F., & Zhang, L. (2009). Comparison of interpolation methods for depth to groundwater and its temporal and spatial variations in the Minqin Oasis of northwest China. Environmental Modelling & Software, 24(10), 1163–1170. doi: 10.1016/j.envsoft.2009.03.009
Thomas, B. F., & Famiglietti, J. S. (2019). Identifying climate-induced groundwater depletion in GRACE observations. Scientific reports9(1): 4124. doi: 10.1038/s41598-019-40155-y
Wada, Y., Van Beek, L. P., Van Kempen, C. M., Reckman, J. W., Vasak, S., & Bierkens, M. F. (2010). Global depletion of groundwater resources. Geophysical research letters37(20). doi: 10.1029/2010GL044571
Yoon, J., Zame, W., & van der Schaar, M. (2018). Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks. IEEE Transactions on Biomedical Engineering, 66(5), 1477–1490. doi: 10.1109/TBME.2018.2874712.