Evaluating the efficiency of dimensionality reduction methods in improving the accuracy of water quality index modeling in Qizil-Uzen River using machine learning algorithms

Document Type : Research/Original/Regular Article

Authors

1 Dr Mohammad Taghi Sattari Associate Professor, Department of Water Engineering Faculty of Agriculture, University of Tabriz, Tabriz, Iran

2 PhD student, Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran

3 Masters student, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran

Abstract

Introduction
Water quality assessment is paramount for various sectors, including environmental planning, public health, and industrial operations. With the increasing importance of ensuring safe water sources, especially for drinking and irrigation purposes, modern methodologies like data mining offer valuable tools for predictive analysis and classification of water quality. Knowledge of water quality is considered one of the most important needs in planning, developing, and protecting water resources. Determining the quality of water for different uses, including irrigation and drinking in different areas of life. The use of modern data mining methods can be beneficial for predicting and classifying the quality of provider water. In the current study, the water quality of the Qizil-Uzen River was evaluated at Qara Gunei stations. In this regard, the drinking water quality index (WQI) using the chemical compounds of glass hardness, alkalinity (PH), electrical conductivity, total dissolved substances, calcium, sodium, magnesium, potassium, chlorine, carbonate, bicarbonate and sulfate in the statistical period of 21 years (2000-2020) was estimated. Water quality assessment is paramount for various sectors, including environmental planning, public health, and industrial operations. With the increasing importance of ensuring safe water sources, especially for drinking and irrigation purposes, modern methodologies like data mining offer valuable tools for predictive analysis and classification of water quality.
 
Materials and Methods
Due to the relatively large number of variables, principal component analysis and independent component analysis methods were used to reduce dimensions, and then different machine learning algorithms including decision tree, logistic regression, and multi-layer perceptron artificial neural network were used to model the water quality index. By using these methods, the number of parameters needed to calculate the quality index was reduced from 12 to 2. Reducing the dimensions of the data saves the time of sampling, monitoring the samples, and determining the quality of the water and reduces the costs required for modeling to a significant amount. The results showed that among the dimensionality reduction methods, the principal component analysis method can perform better than the independent component analysis method. In the current research, the WQI index was modeled using machine learning algorithms including decision tree, logistic regression, and artificial neural network method. The quality of water in the Qizil-Uzen Qara Gunei river station has been evaluated. Then, to estimate the numerical values of the WQI index, TH, pH, EC, TDS, Ca, Na, Mg, K, Cl, CO3, HCO3, and SO4 parameters of the mentioned station in the statistical period of 21 years (1378-1398) were used. PCA and ICA methods have been used to select different input parameters. Modeling has been done in a Python programming environment. Among the available samples, 75% are considered for training and 25% for testing.
 
Results and Discussion
In the present research, to model the water quality index in the first stage, different dimensionality reduction methods such as PCA and ICA were used to reduce the time and cost of implementation. In the second stage, machine learning methods such as decision tree, linear regression, and multilayer perceptron were used. In the method used by Tripathi and his colleagues, by using the principal component analysis method, they reduced the number of parameters needed to calculate the quality index from 28 to 9 and calculated the water quality index with the number of 9 parameters. Examining the two methods of PCA and ICA has reduced the dimensions of the problem from 12 dimensions to 2 dimensions. The results show that the PCA method can help us improve performance with little cost and high accuracy. Because of the PCA dimensions. The comparison of the results of the models was done using different numerical and graphical evaluation criteria, including R2, RMSE, and modified Wilmot coefficient as numerical criteria and Taylor diagram as graphical criteria. Because the PCA algorithm can help reduce noise in data, feature selection, and generate independent and unrelated features from data. The results show that multi-layer perceptron, decision tree, and logistic regression methods accurately perform the water quality index. In this research, for the first time, using the ICA dimension reduction algorithm, while reducing the dimensions of the problem, the water quality index is predicted with an accuracy of over 90%.
 
Conclusion
Water quality index modeling holds significant relevance in agricultural practices, where access to clean water is crucial for irrigation and crop growth. Surprisingly, only a limited number of studies have explored variable reduction methods in water quality index modeling, with none incorporating the relatively novel Independent Component Analysis (ICA) method for dimensionality reduction. Thus, the current research fills this gap by employing PCA and ICA techniques to reduce the dimensionality of large datasets in water quality index modeling. By utilizing these advanced methods, the study aims to enhance efficiency and accuracy in assessing water quality, thereby offering valuable insights for agricultural water management. Following dimensionality reduction, the dataset is then subjected to modeling using various machine learning algorithms. This approach not only optimizes computational resources but also facilitates a deeper understanding of the complex interrelationships among water quality parameters. Through this pioneering research endeavor, the efficacy of ICA alongside PCA in addressing water quality index modeling challenges is evaluated. By integrating these techniques with machine learning methodologies, the study endeavors to provide actionable intelligence for agricultural stakeholders, aiding in informed decision-making and resource allocation. Moreover, by venturing into unexplored territory with the inclusion of ICA, the research contributes to expanding the methodological toolkit available for water quality assessment. As agriculture faces increasing pressure from climate change and resource scarcity, such innovative approaches hold promise in ensuring sustainable water management practices.

Keywords

Main Subjects