### 1. Introduction

*if–then–else*rules to explain the linguistic variables [11]. Over the years, this technique established itself as a useful modeling technique in various hydrological fields. ANFIS was found to be superior in predicting various water resources aspects such as water quality, river flow, flood forecasting, sediment concentration, etc. [12–16]. All these instances among many other applications of ANFIS explain why it has been a popular choice of modeling over the years.

^{2}and RMSE value compared to its original counterpart. Pousinho et al. [21] used a hybridized PSO-ANFIS technique to predict wind power and compared the model’s performance with ARIMA, feed forward neural network, Neural Network with wavelet transform, and wavelet neuro fuzzy model. JalalKamali employed two modified variants of ANFIS namely ANFIS-GA and ANFIS-PSO for predicting groundwater quality of Kerman province, Iran [22]. Basser et al. [23] used a similar ANFIS-PSO based model for estimating the optimal parameters of a protective spur dike. Kisi et al. [24–25] experimented with the ANFIS model by incorporating PSO and DE algorithms for parameter estimation process and showed that modified ANFIS possesses the improved capability of predicting ground-water quality compared to the conventional counterpart. Yosefvand et al. [26] employed a hybrid method based on the ANFIS and PSO for estimating the minimum velocity required for preventing the sedimentation process. Similarly, in another study by Dieu et. al. [27] performed the fusion of ANFIS with cultural (ANFIS-CA), bees (ANFIS-BA) and invasive weed optimization (ANFIS-IWO) algorithms for flood susceptibility mapping. It was found that all three modified versions were capable of finding the optimal model parameters and also succeeded in avoiding the problems of trapping into local minimum. There are numerous other instances available where ANFIS’s parameters were estimated using the heuristic optimization technique [28–29].

To build a fuzzy-neural network based time-series estimation model for estimating rainfall-runoff relationship.

To explore the scope of improving the performance of the conventional ANFIS model by incorporating PSO technique.

To compare the performance of PSO-ANFIS with that of ARIMA model and conventional ANFIS to determine its applicability in the rainfall-runoff estimation process.

### 2. Models Used

### 2.1. Adaptive Neuro-ANFIS

*α, β*and

*r*represent the linear output parameters. The corresponding ANFIS structure is shown in Fig. 1. It has five layers with two kinds of nodes, represented by circle and square. The square node is referred to as the adaptive node which accepts parameters. The circle node is known as the fixed node and it does not accept any parameter.

#### Layer 1

##### (1)

$${O}_{i}^{1}={\xi}_{Ai}(x)\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}OR\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}{O}_{i}^{1}={\xi}_{Bi}(y)\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\text{for\hspace{0.28em}}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}i=1,\hspace{0.17em}2$$*x*and

*y*represent the input to the

*i*

*node. Usually a bell-shaped membership function such as Gaussian function is used for this purpose. The Gaussian membership function (GMF) can be defined as:*

^{th}*a, b*and

*c*form the set of adaptable parameters. Values of these parameters actually determine the type of membership function used (e.g. bell-shaped, triangular etc.). These parameters are also referred to as premise parameters.

#### Layer 2

#### Layer 3

*normalization layer*. Every nodes of this layer computes the ratio of its own rule firing strength to the sum of all others. The output of this layer can be defined by:

#### Layer 4

*defuzzification layer*. The output of layer 4 can be defined as:

##### (5)

$${O}_{4}^{i}=\overline{{w}_{l}}{f}_{i}=\overline{w}({\alpha}_{i}x+{\beta}_{i}y+{r}_{i}),\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}where\hspace{0.17em}i=1,\hspace{0.17em}2$$*w̄*is the output of the Layer 3 and {

*α*,

*β*,

*r*} forms a parameter set known as

*consequent parameters*.

#### Layer 5

*output layer*. The output layer contains a single fixed node. It computes the final output by summing all input signals which can be described as:

##### (6)

$${O}_{5}^{i}={\sum}_{i}\overline{{w}_{l}}{f}_{i}=\frac{{\sum}_{i}{w}_{i}{f}_{i}}{{\sum}_{i}{W}_{i}},\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}where\hspace{0.17em}i=1,\hspace{0.17em}2$$### 2.2. Particle Swarm Optimization (PSO)

*V*

_{i}*(t)*of the i

^{th}particle, it tends to move away from the

*GBest*. However, with

*P*

_{i}*Best*and

*GBest*included in the equation particles are forced to fly closer to the global solution. The way these particles update their velocity and positions can be explained by the equations given below as discussed in [32]. The process is repeated until target value or the maximum number of iterations is attained.

##### (7)

$${V}_{i}(t+1)=w\times {V}_{i}(t)\times r1\times c1\times ({P}_{i}Best-{P}_{i}(t))+r2\times c2\times (GBest-{P}_{i}(t))$$*V*

_{i}*(t)*is the velocity with which the i

^{th}particle is moving.

*P*

_{i}*(t+1)*is the updated position of the particle based on its current position

*Pi(t)*and moving velocity

*V*

_{i}*(t + 1). P*

_{i}*Best*and

*GBest*are personal and GBest values as discussed earlier. The values

*r1*and

*r2*(ranges between 0 and 1) are random values regenerated for each velocity update.

*c1*and

*c2*are the rate of learning parameter and

*c1*,

*c2*,

*w*are user supplied coefficients.

### 3. Methodology

### 3.1. Study Site and Data Used

### 3.2. Selection of Inputs to the Model

##### (9)

$${Q}_{t}=f({Q}_{t-4},\hspace{0.17em}{Q}_{t-3},\cdots ,{Q}_{t-1},\hspace{0.17em}{R}_{t-4},\hspace{0.17em}{R}_{t-3},\cdots ,\hspace{0.17em}{R}_{t-1}\hspace{0.17em})$$*X*= {

*X*

_{1},

*X*

_{2}, ···,

*X*

*}, applying sliding window will give us Y which includes several*

_{n}*Y*

_{i}*’*s of the form

*Y*

*= {*

_{i}*X*

*,*

_{i}*X*

_{i + μ}, ···,

*X*

_{i + (m − 1 ) × μ}}, for

*i*= 1, 2, ···,

*n*− (

*m*− 1)

*μ*. Here,

*μ*is the lag and

*m*represents the window size. In this experiment

*μ*is set to 1 and

*m*is determined by using PACF technique discussed above.

##### (10)

$${{\overrightarrow{a}}^{\prime}}^{(t)}=\overrightarrow{q}(t)-\overrightarrow{a}(t-1),\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}t=0,\hspace{0.17em}1,\hspace{0.17em}\mathrm{....}$$### 3.3. Performance Evaluation Criteria

^{2}), coefficient of efficiency or Nash-Sutcliffe efficiency (NSE) and the root RMSE are used for performance evaluation purposes. R

^{2}represents the degree of determination among the measured and predicted values. Its value typically ranges between 0 and 1 indicating no and perfect fit between data points and best fit regression line, respectively. The value of NSE ranges from -∞ to 1 and it is often used to evaluate the model’s predictive power. Similarly, the RMSE is the measure of the average magnitude of the error. It represents the square root of the average squared differences between prediction and actual observation. All these indices can be defined as follows:

##### (12)

$${R}^{2}={\left[\sum _{i=1}^{n}({Y}_{i}-\overline{Y})({\widehat{Y}}_{i}-\tilde{Y})\right]}^{2}/\sum _{i=1}^{n}{({Y}_{i}-\overline{Y})}^{2}\sum _{i=1}^{n}{({\widehat{Y}}_{i}-\tilde{Y})}^{2}$$##### (13)

$$NSE=1-\sum _{i=1}^{n}{({Y}_{i}-{\widehat{Y}}_{i})}^{2}/\sum _{i=1}^{n}{({Y}_{i}-\overline{Y})}^{2}$$*Y*

*and*

_{i}*Ŷ*

*represent the measured and predicted outcome for*

_{i}*i*

*data, respectively.*

^{th}*n*is the total number of data points considered in the performance evaluation process and

*Ȳ*,

*Ỹ*represents the mean of measured and predicted value, respectively.

### 3.4. Model Development

#### 3.4.1. ARIMA model

*AR*(autoregressive term), I (differencing term) and

*MA*(moving average term). It is commonly represented as

*ARIMA(p, d, q)*; where,

*p*denotes the autoregressive term,

*d*is the differencing factor important for data stationarity and

*q*is the moving average window size. The necessary model configuration was done through the analysis of autocorrelation function (ACF) and partial autocorrelation function (PACF) plots generated from runoff data analysis. ACF and PACF provide an initial guess about the different lag factor. As can be seen from Fig. 4 (b) and (c), approximately, first 50 lags on ACF plot and 4 lags in PACF plot shows a significant positive correlation. Therefore, the choice of

*p = 50*, and

*q = 4*would give an initial choice to begin with. However, the actual values of

*p, d*and

*q*were selected on a trial and error basis. Different values were tried and the best result was achieved using ARIMA(10,1,4).

#### 3.4.2. ANFIS and PSO-ANFIS model

*c1*and the social learning factor

*c2*were both initially set to 2. The number of particles and iterations were set to 100 and 500, respectively. The inertia weight was set to 0.5, the maximum velocity was set to 3. Random numbers r1 and r2 are kept within the range of 0 to1. Most of the PSO configuration was done by trial and error method with values adopted from the work of Eberhart [31].

##### (15)

$$\mu {A}_{i}(x)=\text{exp}\left\{-{\left[{\left(\frac{x-{c}_{i}}{{a}_{i}}\right)}^{2}\right]}^{{b}_{i}}\right\}$$### 4. Results

^{2}, NSE, and RMSE have been used. From Table 1 it can be observed that all three methods have different performances during both the training and testing phases. In the training phase, the PSO-ANFIS model achieved about 0.3% and 0.6% improvement over conventional ANFIS in terms of NSE and RMSE, respectively. Whereas, in comparison to ARIMA, it achieved an overall improvement of 0.89% in terms of NSE and a 1.19% reduction in overall RMSE. A similar trend was found in the testing phase as well. PSO-ANFIS measured an improvement of 2% and 9% in terms of NSE value over conventional ANFIS and ARIMA model, respectively. At the same time, it measured a slight reduction of 0.2% and 1.2% in overall RMSE value over conventional ANFIS and the ARIMA models, respectively. However, total computational time (validation time) as shown in Table 1, suggests that the ARIMA model was superior in computing the result. Proposed PSO-ANFIS model took 1.45 s more than the ARIMA model but took 3.8 s less than ANFIS to produce the result.

^{2}value as well. Fig. 6 shows the prediction and correlation graph of all three models. The proposed PSO-ANFIS achieved R

^{2}equal to 0.94 which is better than that of R

^{2}value of 0.92 and 0.88 of ANFIS and ARIMA model, respectively. This shows the presence of a higher correlation between observed and predicted runoff in the PSO-ANFIS model. Moreover, if we observe the observed vs. predicted runoff plot, we can see that PSO-ANFIS is able to predict the high and extreme runoff values more effectively. For instance, the extreme value of 400 m

^{3}and above runoff values is best predicted by the PSO-ANFIS model. Overall, we can say that PSO-ANFIS model outperformed ARIMA and conventional ANFIS in all aspects except the computational time where ARIMA performed the best.

### 5. Discussions

*mean*= 0.014 and

*std*= 1.681 which is highly acceptable. Adding extra rainfall attribute in the model equation will certainly help to achieve an even better result. Such a multivariate version of ARIMA models is commonly known as ARIMAX model. Development of such a model can be considered as future scope of this work.