### 1. Introduction

### 2. Data Driven Techniques

### 2.1. Artificial Neural Network

### 2.2. M5 Model Tree

*x*

_{1}×

*x*

_{2}variables into various different linear regression function at the leaves namely LM1–LM6 by using M5 algorithm of Model tree. Simplified form of the model equation is

*y*= b

_{0}+ b

_{1}

*x*

_{1}+ b

_{2}

*x*

_{2}, in which b

_{0}, b

_{1}, and b

_{2}are linear regression constants and Fig. S2 (b) explains the relation of branches in the form of a tree diagram. For details of MT readers are referred to [28, 30].

### 2.3. Genetic Programming and Multi Gene Genetic Programming

_{1}, x

_{2}and x

_{3}. Given model structure contains nonlinear terms (e.g., sin, cos, sqrt.) though linear in the parameters with respect to the coefficients d

_{0}, d

_{1}& d

_{2}. The function set contain the basic arithmetic operators (+, x, −, √, / etc.) and Boolean logic functions (sin, cos, tanh, etc.). It is relevant to note that the maximum permissible number of genes (Gmax) for a model and the maximum tree depth (Dmax) is specified by the user to have a control over the complexity of the generated model. The evolved models are linear combinations of low order nonlinear transformations of the predictor variables. For details of GP readers are referred [33] and for MGGP readers referred to [17].

### 3. Study Area

### 4. Water Quality Data Set

_{3}-N), pH, electrical conductivity (EC), water temperature (Temp.) etc. [10]. To select input parameters for DO and BOD models, it is necessary to understand the relationship of these parameters with DO and BOD.

_{3}-No

_{2}), which is responsible for the growth of water hyacinth covering the surface of river making it difficult for the sunlight to reach beneath the water surface decreasing the rate of aeration and in turn the DO demand increases [36], pH is another important water quality parameter which is controlled by inter-related chemical reactions that produce or consume hydrogen ions. Low pH indicates the concentration of hydrogen ions or the increased bacterial activity for organic matter decomposition i.e., low BOD and DO. While pH increases the solubility of phosphorus and nitrates making them more accessible for plant growth and increasing the demand for dissolved oxygen which ultimately increases the BOD content [37, 38]. Water Temperature (Temp.) is a controlling factor for aquatic life: it controls the rate of metabolic activities, reproductive activities and therefore, life cycles. If stream temperature increases, decreases, or fluctuates too widely, metabolic activities fluctuate. The water quality data set required for the current study was collected from Hydro Nashik, Maharashtra, India. River quality parameters are greatly influenced by anthropogenic activities, usually coupled with non-linear and complex biochemical processes. Biochemical oxygen demand and Dissolved Oxygen are influenced by factors like total solids, alkalinity, nitrite, pH, conductivity, water temperature, etc. [10]. To select input parameters for DO and BOD models, it is necessary to understand the relationship of these parameters and DO and BOD.

_{3}-No

_{2}), which is responsible for the growth of water hyacinth, covering the surface of a river, making it difficult for the sunlight to reach beneath the water surface, decreasing the rate of aeration and, in turn, the DO demand increases [36]. pH is another important water quality parameter that is controlled by inter-related chemical reactions that produce or consume hydrogen ions. Low pH indicates decomposition of organic matter [37]. Water temperature is a controlling factor for aquatic life: it controls the rate of metabolic and reproductive activities and, therefore, life cycles. If stream temperature increases, decreases, or fluctuates too widely, metabolic activity will fluctuate as well; hence the rate of decomposition by microorganisms will be hampered. [39].

_{3}-No

_{2}as the main contributors because No

_{3}-N rapidly oxidises to No

_{3}-No

_{2}, which is responsible for decreasing DO. Similar pattern is observed for BOD too.

_{3}-N) and Temperature (Temp) and parameters for modelling BOD: Dissolved oxygen (DO), pH, Electrical conductivity (EC), Alkalinity (ALK), Total Solids (TS),Total Dissolved Solids (TDS), and Nitrite/Nitrate (No

_{3}-N, No

_{3}-No

_{2}).

### 5. Model Development

### 6. Results and Discussion

### 6.1. DO Models

_{3}-No

_{2}and 1 output: DO and single hidden layer with 10 hidden neurons was trained for the lowest MSE was achieved. ANN model 1-1 exhibited a reasonable performance with the correlation between observed DO and Predicted DO as = 0.89 and with lower RMSE of 0.61 mg/L

**(**Table 2). Lower RMSE depicts the lesser spread of the residual error (between observed and predicted values) i.e., the standard deviation of the residuals and can contribute towards a better performing model with ANN. ANN model 1-2 i.e., for Mula river resulted into a good performing model with R=0.91, however with a higher RMSE of 0.98 as compared to ANN model 1-1. Higher RMSE of 1.52 can be seen for model 1-3 for Mula and Mutha river. The larger RMSE is due to the larger deviation in data which indicates larger spread of data values as compared to values for Mutha and Mula individually. Being sensitive towards higher values, MARE value is also large i.e., 1.52 in Model 1-3. Output ANN in the form of weights and biases can be further analyzed.

##### (1)

$$\begin{array}{c}\text{DO}=0.0651\hspace{0.17em}(\text{Alk}.)-0.00326\hspace{0.17em}(\text{TS})+\\ 0.0651\hspace{0.17em}\text{pH}-0.0651\hspace{0.17em}{(\text{Alk}.)}^{1/2}-0.00326\hspace{0.17em}{(\text{EC})}^{3/2}+\\ 2.98\text{e}-6\hspace{0.17em}{(\text{EC})}^{5/2}\hspace{0.17em}(0.00163\hspace{0.17em}(\text{TDS})\hspace{0.17em}({\text{NO}}_{3}-{\text{NO}}_{2}))/\text{pH}+\\ (0.0214\hspace{0.17em}(\text{EC}){(\text{TS})}^{1/2}\hspace{0.17em}\text{pH})/(\text{Alk}.)+4.1\end{array}$$_{3}-No

_{2}, total solids and other parameters. This finding is true according to the fundamental knowledge of the DO as stated in section 4 [46]. The equations of each of the genes for model 1-2 are shown in Fig. S7.

#### Linear Equation developed for M5T Model 1-1

##### (2)

$$\begin{array}{c}\text{LM}1\hspace{0.17em}\text{DO}=-0.0028\hspace{0.17em}(\text{EC})-0.0254\hspace{0.17em}(\text{Alk}.)+\\ 0.0035\hspace{0.17em}(\text{TS})-0.0053\hspace{0.17em}(\text{TDS})+0.1527\hspace{0.17em}(\text{pH})+\\ 0.1088\hspace{0.17em}(\text{Temp}.)+4.0976\end{array}$$##### (3)

$$\begin{array}{c}\text{LM}2\hspace{0.17em}\text{DO}=-0.0028\hspace{0.17em}(\text{EC})-0.0093\hspace{0.17em}(\text{Alk}.)+\\ 0.0035\hspace{0.17em}(\text{TS})-0.0052\hspace{0.17em}(\text{TDS})+0.1527\hspace{0.17em}(\text{pH})+\\ 0.023\hspace{0.17em}(\text{Temp}.)+6.0998\end{array}$$##### (4)

$$\begin{array}{c}\text{LM}3\hspace{0.17em}\text{DO}=-0.0048\hspace{0.17em}(\text{EC})-0.0041\hspace{0.17em}(\text{Alk}.)+\\ 0.0036\hspace{0.17em}(\text{TS})+0.2609\hspace{0.17em}(\text{PH})+1.7089\end{array}$$### 6.2. BOD Model Using Observed DO along with Other Input Parameters

_{3}) as seen in Table S4. The results of these models are then be compared with models of set 3 where in BOD was predicted using modelled DO. This would throw a light on efficacy of the models by virtue of the comparison between the BOD models developed with modelled DO and observed DO.

##### (5)

$$\begin{array}{c}\text{BOD}=1.22{\text{e}}^{-4}\hspace{0.17em}{(\text{EC})}^{2}\hspace{0.17em}{(\text{Alk}.)}^{2}\hspace{0.17em}(\text{DO})+{({\text{No}}_{3}-\text{N})}^{1/2}-\\ 13.4\hspace{0.17em}tanh\hspace{0.17em}{(\text{DO})}^{4}{({\text{No}}_{3}-\text{N})}^{2}-4.09{\text{e}}^{-6}(\text{Alk}.)\hspace{0.17em}({\text{No}}_{3}-\text{N})\hspace{0.17em}(\text{EC})+\\ ({\text{No}}_{3}-\text{N})+(\text{DO})\hspace{0.17em}(\text{EC})+4.04{\text{e}}^{-4}{(\text{DO})}^{2}\hspace{0.17em}(\text{Alk}.){({\text{No}}_{3}-\text{N})}^{2}+13.3\end{array}$$_{3}-N.

##### (6)

$$\begin{array}{c}\text{BOD}=7.54\hspace{0.17em}tanh\hspace{0.17em}(\text{DO})-1.0\hspace{0.17em}{(\text{pH})}^{2}-6.57\text{e}{-}^{5}((\text{EC}))+\\ (\text{TS})+{\text{No}}_{3}-\text{N}{)}^{2}{)}^{2}+0.00129\hspace{0.17em}(\text{EC})\hspace{0.17em}{(\text{TS})}^{1/2}{\text{NO}}_{3}-\text{N})-\\ (0.00402\hspace{0.17em}(\text{DO})\hspace{0.17em}(\text{EC})\hspace{0.17em}{\text{No}}_{3}-\text{N})/{((\text{DO})+6.0)}^{1/2}-4.91\end{array}$$#### Linear Equation developed for M5T model 2-1

##### (7)

$$\begin{array}{c}\text{LM}-1\hspace{0.17em}\text{BOD}=-3.773\hspace{0.17em}(\text{DO})+0.0055\hspace{0.17em}(\text{Alk}.)-\\ 0.0033\hspace{0.17em}(\text{TDS})+20.537\end{array}$$##### (8)

$$\begin{array}{c}\text{LM}-2\hspace{0.17em}\text{BOD}=-0.4616\hspace{0.17em}(\text{DO})-0.7109\hspace{0.17em}({\text{NO}}_{3}-{\text{NO}}_{2})+(0.0146\hspace{0.17em}9\text{Alk}.)-\\ 0.0025\hspace{0.17em}(\text{TDS})+6.2165\end{array}$$### 6.3. BOD Model Using Modelled DO

_{3}-N and DO follow by other parameters. DO has a high influence in BOD prediction which was also observed in set 2 and can be seen in all the models in set 3 as well. The set of equations developed for Mutha river i.e., for 3-1 using MGGP is shown in Table S6.

_{3}-N and DO. According to the fundamentals, this finding is also correct, since total solids reveal the existence of organic matter, which directly contributes to the increase in BOD, while the presence of nitrite demonstrates the breakdown of organic matter [46, 47].