Classification and Regression-based Machine Learning Approach to Predict Mine Water Quality Index

Jump To References Section

Authors

  • Department of Mining Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah - 711103, West Bengal ,IN
  • Department of Mining Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah - 711103, West Bengal ,IN
  • Department of Mining Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah - 711103, West Bengal ,IN
  • Department of Mining Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah - 711103, West Bengal ,IN

DOI:

https://doi.org/10.18311/jmmf/2023/35315

Keywords:

Data Mining, Decision Trees Classification, Mine Water Quality Management, Predictive Modelling, Regression Analysis, Water Quality Index Prediction

Abstract

This work proposes a data mining-based prediction and development of the water quality index in mining areas. A mathematical equation for the index and predicted model is derived quantitatively in the study. Predicting water quality often involves applying conventional data mining techniques like classification and regression. Predictive learning and testing models can be evaluated using previous monitoring in real-time datasets and implementing k-fold cross-validation methods. The "decision trees" classification methodology outperforms other classification methods with 97.30% and 99.50% accuracy for training and testing model validation. MAE, RMSE, MSE, and R-squared are used in regression analysis to evaluate prediction accuracy and model performance. Regression model errors are absent with an R-squared value of 1. The present research showcases the efficacy of data mining techniques in accurately estimating mine water quality. These findings help improve mine water quality management.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2023-12-28

How to Cite

Singh, K., Kumar, D., Mukhopadhyay, S., & Banerjee, I. (2023). Classification and Regression-based Machine Learning Approach to Predict Mine Water Quality Index. Journal of Mines, Metals and Fuels, 71(11), 1884–1895. https://doi.org/10.18311/jmmf/2023/35315
Received 2023-10-09
Accepted 2023-12-28
Published 2023-12-28

 

References

Islam R, Faysal SM, Amin R, Juliana FM, Islam MJ, Alam J, Hossain MN, Asaduzzaman M. Assessment of pH and Total Dissolved Substances (TDS) in the commercially available bottled drinking water. IOSR Journal of Nursing and Health Science. 2017; 6(5):35-40.

Tyagi S, Sharma B, Singh P, Dobhal R. Water quality assessment in terms of water quality index. American Journal of Water Resources. 2013; 1(3):34-8. https://doi. org/10.12691/ajwr-1-3-3

Executive Summary of EIA/EMP: Kandri Manganese Mine. Available at http://mpcb.ecmpcb.in/notices/pdf/ kandri.pdf (Accessed 14 September 2021)

Executive Summary of EIA/EMP: Munsar Manganese Mine. Available at http://mpcb.ecmpcb.in/notices/pdf/ exe-summary-moil-nagpur.pdf (Accessed 26 September 2021)

United State EPA 816-F-09-004, May 2009, http:// water.epa.gov/drink/contaminants/upload/mcl-2.pdf (Accessed 12 October 2021).

Hydrology and Water Resources Information System for India, http://117.252.14.242/rbis/india_information/ water%20quality%20standards.htm (Accessed 21 October 2021).

Jain SK, Agarwal PK, Singh VP. Hydrology and water resources of India. Springer Science and Business Media; 2007. 8. World Health Organization. Guidelines for drinkingwater quality: First addendum to the fourth edition.

Rajagopalan B, Lall U. A k‐nearest‐neighbor simulator for daily precipitation and other weather variables. Water Resources Research. 1999; 35(10):3089-101. https://doi. org/10.1029/1999WR900028

Bessler FT, Savic DA, Walters GA. Water reservoir control with data mining. Journal of Water Resources Planning and Management. 2003; 129(1):26-34. https:// doi.org/10.1061/(ASCE)0733-9496(2003)129:1(26)

Hyvönen S, Junninen H, Laakso L, Dal Maso M, Grönholm T, Bonn B, Keronen P, Aalto P, Hiltunen V, Pohja T, Launiainen S. A look at aerosol formation using data mining techniques. Atmospheric Chemistry and Physics. 2005; 5(12):3345-56. https://doi.org/10.5194/ acp-5-3345-2005

Palani S, Liong SY, Tkalich P. An ANN application for water quality forecasting. Marine Pollution Bulletin. 2008; 56(9):1586-97. https://doi.org/10.1016/j. marpolbul.2008.05.021

Mucherino A, Papajorgji P, Pardalos PM. A survey of data mining techniques applied to agriculture. Operational Research. 2009; 9:121-40. https://doi.org/10.1007/ s12351-009-0054-6

Gibert K, Rodríguez-Silva G, Rodríguez-Roda I. Knowledge discovery with clustering based on rules by states: A water treatment application. Environmental Modelling and Software. 2010; 25(6):712-23. https://doi. org/10.1016/j.envsoft.2009.11.004

Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF. Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollution Bulletin. 2012; 64(11):2409-20. https://doi.org/10.1016/j. marpolbul.2012.08.005

Motamarri S, Boccelli DL. Development of a neuralbased forecasting tool to classify recreational water quality using fecal indicator organisms. Water Research. 2012; 46(14):4508-20. https://doi.org/10.1016/j. watres.2012.05.023

Radojević ID, Stefanović DM, Čomić LR, Ostojić AM, Topuzović MD, Stefanović ND. Total coliforms and data mining as a tool in water quality monitoring. African Journal of Microbiology Research. 2012; 6(10):2346-56. https://doi.org/10.5897/AJMR11.1346

Verma A, Wei X, Kusiak A. Predicting the total suspended solids in wastewater: A data-mining approach. Engineering Applications of Artificial Intelligence. 2013; 26(4):1366-72. https://doi.org/10.1016/j. engappai.2012.08.015

Kovács J, Kovács S, Magyar N, Tanos P, Hatvani IG, Anda A. Classification into homogeneous groups using combined cluster and discriminant analysis. Environmental Modelling and Software. 2014; 57:52-9. https://doi.org/10.1016/j.envsoft.2014.01.010

Liu M, Lu J. Support vector machine- an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river? Environmental Science and Pollution Research. 2014; 21:11036-53. https://doi.org/10.1007/s11356-014- 3046-x

Mohammadpour R, Shaharuddin S, Chang CK, Zakaria NA, Ghani AA, Chan NW. Prediction of water quality index in constructed wetlands using support vector machine. Environmental Science and Pollution Research. 2015; 22:6208-19. https://doi.org/10.1007/s11356-014-3806-7

Babbar R, Babbar S. Predicting river water quality index using data mining techniques. Environmental Earth Sciences. 2017; 76:1-5. https://doi.org/10.1007/s12665-017-6845-9

Water Quality of Medium and Minor Rivers - 2019 Data as received from S.P.C.B.’s/P.C.C.’s under N.W.M.P. Available at http://www.cpcbenvis.nic.in/waterpollution/2019/Water_Quality_MediumMinor_ River_2019.pdf (Accessed 13 October 2021)

Loh WY. Classification and regression trees. Wiley interdisciplinary reviews: Data mining and knowledge discovery. 2011; 1(1):14-23. https://doi.org/10.1002/ widm.8

Kapil D, Tyagi P, Kumar S, Tamta VP. Cloud computing: Overview and research issues. In2017 International Conference on Green Informatics (ICGI), IEEE. 2017; 71-6. https://doi.org/10.1109/ICGI.2017.18

Tsai WT, Shao Q, Sun X, Elston J. Real-time serviceoriented cloud computing. In 2010 6th World Congress on Services. IEEE. 2010; 473-8. https://doi.org/10.1109/ SERVICES.2010.127

Significance of the R and D projects in MOIL: Sustainable Development Framework – Environment and Patent. 2022. Available at: https://www.moil.nic.in/userfiles/file/ InvRel/Annual_Report_2020-21.pdf

IS I. Indian standard specification for drinking water. Google Scholar. 2012; 10500(1).

Ali H, Salleh MN, Saedudin R, Hussain K, Mushtaq MF. Imbalance class problems in data mining: A review. Indonesian Journal of Electrical Engineering and Computer Science. 2019; 14(3):1560-71. https://doi. org/10.11591/ijeecs.v14.i3.pp1552-1563

Melo F. Encyclopedia of systems biology; 2013.

Guide Manual: Water and Waste Water, Central Pollution Control Board, New Delhi; 2021. Available at: http://www.cpcb.nic.in/upload/Latest/Latest_67_ guidemanualw&wwanalysis.pdf

Machine Learning Crash Course. 2021. Available at: https://developers.google.com/machine-learning/crashcourse/ classification/roc-and-auc

Chauhan A, Singh S. Evaluation of Ganga water for drinking purpose by water quality index at Rishikesh, Uttarakhand, India. Report and opinion. 2010; 2(9):53- 61.

Chowdhury RM, Muntasir SY, Hossain MM. Water quality index of water bodies along Faridpur-Barisal road in Bangladesh. Glob Eng Tech Rev. 2012; 2(3):1-8.

Rao CS, Rao BS, Hariharan AV, Bharathi NM. Determination of water quality index of some areas in Guntur District Andhra Pradesh; 2010.

Balan I, Shivakumar M, Kumar PM. An assessment of groundwater quality using water quality index in Chennai, Tamil Nadu, India. Chronicles of Young Scientists. 2012; 3(2):146. https://doi.org/10.4103/2229- 5186.98688

Brown RM, McClelland NI, Deininger RA, O’Connor MF. A water quality index—crashing the psychological barrier. Indicators of Environmental Quality: Proceedings of a symposium held during the AAAS meeting in Philadelphia, Pennsylvania, Springer US. 1972; 173-82. https://doi.org/10.1007/978-1-4684-1698- 5_15

MathWorks. Linear Regression. 2022. Available at: https://in.mathworks.com/help/MATLAB/data_ analysis/linear-regression.html

Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science. 2021; 7:e623. https://doi.org/10.7717/peerj-cs.623

Wright S. Correlation and causation; 1921.

Frost J. Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit? Minitab blog.

Sammut C, Webb GI. editors. Mean Absolute Error; 2010a.

Sammut C, Webb GI. Mean squared error. Encyclopedia of Machine Learning. 2010b; 653. https://doi. org/10.1007/978-0-387-30164-8