دانشگاه محقق اردبیلی مدل سازی و مدیریت آب و خاک 2783-2546 3 2 2023 06 22 Comparing the performance of the multiple linear regression classic method and modern data mining methods in annual rainfall modeling (Case study: Ahvaz city) مقایسه عملکرد روش کلاسیک رگرسیون خطی چندگانه و روش‌های داده‌کاوی نوین در مدل‌سازی بارش سالانه (مطالعه موردی: شهر اهواز) 125 142 1777 10.22098/mmws.2022.11337.1120 FA پویا اللهویردی پور دانشجوی کارشناسی ارشد/ گروه مهندسی آب، دانشکده کشاورزی، دانشگاه تبریز، تبریز، ایران 0000-0003-2096-5742 محمدتقی ستاری دانشیار/ گروه مهندسی آب، دانشکده کشاورزی، دانشگاه تبریز، تبریز، ایران 0000-0002-5139-2118 Journal Article 2022 08 23 Introduction Prediction of hydrological variables, especially precipitation, is very important in the management and planning of water resources. For this reason, accurate estimation methods have always been of interest to researchers. Furthermore, due to the water crisis in different regions, it is necessary to use different methods to predict the rainfall and the resulting runoff so that comprehensive and appropriate management can be applied in the field of water distribution. Since the past, various methods have been developed and used by researchers to predict hydrological variables. The use of classical methods such as multiple linear regression to predict hydrological variables, especially precipitation, has been one of the most important and widely used methods that have had good results. Recently, data mining methods have been developed for this purpose. In this research, a comparison between the performance of the classic multiple linear regression and modern data mining methods was made in the annual rainfall modeling of Ahvaz city, and finally the best model in terms of performance was determined.   Materials and Methods In this study, the annual rainfall of Ahvaz city has been investigated and modeled. Meteorological data from Ahvaz station was collected over a period of 30 years (1992-2021). The data validation tests including tests of homogeneity, normality, trend, and outlier data were performed. Annual rainfall modeling of Ahvaz city was done with Multiple Linear Regression (MLR), Principal Component Analysis (PCA), Gene Expression Programming (GEP), and Support Vector Machine (SVM). Finally, using the coefficient of determination (R2), Root Mean Square of Errors (RMSE), Nash-Sutcliffe Efficiency (NSE), and Willmott index (WI), the accuracy and performance of the models were compared.   Results and Discussion In this study, XLSTAT software was used to model rainfall with multiple linear regression. In order to simulate precipitation through the SVM model, it is possible to examine the types of kernel function, among which linear and polynomial kernels of the second and third degree, which are common types used in hydrology, are selected and through trial and error the optimal results of this The type of kernels was calculated. According to these results, the support vector machine model with third degree polynomial kernel was determined as the optimal method of precipitation modeling. In simulating the precipitation process using gene expression programming, because this model has the ability to select more effective variables and eliminate variables with less influence, therefore, in this project, all eight input factors are used to determine meaningful variables and for further investigation, in addition to the set The default mathematical operators of the program (F1), modes based on the values of the four main operators (F2) and the set of operators F3 and F4 have been used. The results of the validation tests that check the homogeneity, trend, normality, and outlier data showed the good quality of the recorded data and the possibility of using them with a high percentage of confidence to continue the study. The results of comparing the models showed that the methods of PCA and GEP with R2=0.85, NSE=0.85, and WI=0.96 and very little difference in RMSE equal 35.49 and 35.70, respectively. They have predicted the annual rainfall of Ahvaz with better performance and more accuracy compared to other models. Considering the water crisis in different regions of the country, especially in Ahvaz, it is suggested to use the methods introduced in this research to predict rainfall and runoff resulting from it, so that a comprehensive and appropriate management can be applied in the field of water distribution.   Conclusion In this research, a comparison was made between classical statistical methods and some modern data mining methods in forecasting the annual rainfall of Ahvaz city. The hydrological data of Ahvaz synoptic meteorological station was collected in a period of 30 years (1371-1400) and first the data was verified using homogeneity, trend, normality and outlier data tests. The results showed the good quality of the recorded data and the possibility of using them with a high percentage of confidence. Multiple linear regression (MLR), principal component analysis (PCA), gene expression programming (GEP) and support vector machine (SVM) methods were used to model precipitation. The results of running the models were compared using the coefficient of explanation (R2), root mean square errors (RMSE), Nash-Sutcliffe efficiency (NSE) and Wilmot index (WI). The results showed that the methods of principal component analysis and gene expression programming with R2 criteria equal to 0.85, NSE equal to 0.85 and WI equal to 0.96 and a very small difference in RMSE values equal to 35.49 and 35.70, respectively, compared to Other models have better performance and more accuracy. According to the results of this research, it is suggested to use modern data mining methods in addition to classical statistical methods in future researches. Also, it is necessary to pay attention to the use of functions and optimal factors of models to achieve the best results in future researches. Considering the water crisis in different parts of the country, especially in Ahvaz, it is suggested to use the methods introduced in this research to predict the rainfall and runoff caused by it, so that a comprehensive and appropriate management can be applied in the field of water distribution. پیش‌بینی متغیرهای هیدرولوژیکی به‌ویژه بارش اهمیت بسیار زیادی در مدیریت و برنامه‌ریزی منابع آبی داشته و به همین دلیل روش‌هایی که بتوانند برآوردی دقیق از آن داشته باشند همواره مورد توجه پژوهش‌گران بوده است. در این پژوهش مقایسه‌ای بین عملکرد روش کلاسیک رگرسیون خطی چندگانه و روش‌های داده‌کاوی نوین در مدل‌سازی بارش سالانه شهر اهواز انجام شده است. داده‌های هیدرولوژیکی مربوط به ایستگاه هواشناسی همدیدی اهواز در دوره زمانی 30 ساله (1371-1400) گردآوری شده و نسبت به کنترل کیفی داده‌ها با استفاده از آزمون‌های همگنی، روند، بهنجاری و ارزیابی داده‌های پرت اقدام شد. سپس جهت مدل‌سازی بارش از روش‌های رگرسیون خطی چندگانه (MLR)، تحلیل مؤلفه‌های اصلی (PCA)، برنامه‌نویسی بیان ژن (GEP) و ماشین بردار پشتیبان (SVM) استفاده شد. از 70 درصد داده‌ها جهت آموزش و از 30 درصد داده‌ها جهت صحت‌سنجی مدل‌ها استفاده شده و نتایج حاصل از اجرای مدل‌ها با استفاده از معیارهای ضریب تبیین (R2)، جذر ‌میانگین مربعات خطاها (RMSE)، راندمان نش-ساتکلیف (NSE) و شاخص ویلموت (WI) مقایسه شدند. نتایج نشان ‌داد که روش‌های تحلیل مؤلفه‌های اصلی و برنامه‌نویسی بیان ژن با معیار R2 برابر 0.85 و NSE برابر 0.85 و WI برابر 0.96 و اختلاف بسیار ناچیز در مقادیر RMSE به‌ترتیب برابر با 35.49 و  35.70 نسبت به سایر مدل‌ها عملکرد بهتر و دقت بیش‌تر در پیش‌بینی بارش سالانة اهواز دارند. با توجه به بحران آب در نقاط مختلف کشور و به‌ویژه اهواز پیشنهاد می‌شود با استفاده از روش‌های معرفی شده در این پژوهش نسبت به پیش‌بینی بارش‌ها و رواناب‌های ناشی از آن اقدام شود تا مدیریت جامع و مناسبی در زمینه توزیع آب اعمال شود.

https://mmws.uma.ac.ir/article_1777_37e9fb7da724c5adada801efc802707e.pdf