### 1. Introduction

_{2}S) from a trace amount [4] to extremely high values ranging between 20,000 and 40,000 ppm [5, 6]. Therefore, good care must be taken when biogas is used in internal combustion engines (ICEs) and boilers for electricity and thermal energy generation because the corrosive nature of H

_{2}S, especially in the presence of water vapor, causes the metal equipment to wear down. As the recommended H

_{2}S concentration acceptable in ICEs and boilers are below 100 ppm and 1,000 ppm, respectively [7], H

_{2}S-rich biogas must be treated before it can be used for heat and electricity generation.

_{2}S removal from biogas can be performed using physical/chemical techniques based on (1) adsorption on a solid with a high-surface area such as activated carbon and iron oxides or (2) absorption in water or chemical solvents such as sodium hydroxide and aqueous solutions of bivalent metal sulfates [8–10]. The main disadvantage associated with the adsorption techniques is that the adsorbents, as they are saturated over time, must be expensively regenerated or treated as a hazardous waste. Moreover, the adsorbents are often expensive [11, 12]. Regarding the absorption techniques (water/chemical scrubbing), they generate a contaminated liquid stream that needs to be treated. Therefore, a post-treatment step is required, which results in high operational costs. Other disadvantages are high consumption of water and high cost of chemical solvents [13, 14]. In contrast, biogas desulfurization using biofiltration is an attractive and potential alternative to the physicochemical techniques because of its low operational cost and low environmental impact [15]. In desulfurizing bio-filters (BFs), a stream of biogas containing H

_{2}S passes through a biofilm of sulfide oxidizing bacteria (SOB) immobilized on a nutrient-rich porous packed media. H

_{2}S diffuses from the gas phase into the biofilm where it is metabolically consumed by SOB and degraded to elemental sulfur (S

^{0}) and/or sulfate (SO

_{4}

^{2−})[16]. Chemotrophic SOB are the most widely studied group of microorganisms for biological desulfurization processes. They drive energy from chemical reactions such as oxidation of H

_{2}S or S

^{0}under different environmental conditions, aerobic (OX) or anoxic (AX), and utilize carbon dioxide (CO

_{2}) or organic carbon compounds as the carbon source [15]. An OX condition is one in which dissolved oxygen is present as an electron acceptor, while an AX condition is one in which nitrate (NO

_{3}

^{−}) is served as electron acceptor [17]. Chemotrophic SOB from the genera

*Acidithiobacillus*,

*Thermothrix*,

*Thioalkalispira*,

*Thiothrix*, and

*Thiovulum*have shown great potential to oxidize H

_{2}S under OX condition, and the genera such as

*Beggiatoa*,

*Thiomargarita*,

*Thioploca*,

*Thioalkalivibrio*, and

*Thiobacillus*are capable of oxidizing H

_{2}S under either OX or AX condition [18].

##### (2)

$${\text{H}}_{2}{\text{S}}_{(\text{aq})}\leftrightarrow {{\text{H}}^{+}}_{(\text{aq})}+{{\text{HS}}^{-}}_{(\text{aq})}$$##### (3)

$${{\text{HS}}^{-}}_{(\text{aq})}+0.5\hspace{0.17em}{\text{O}}_{2\hspace{0.17em}(\text{aq})}\to {{\text{S}}^{0}}_{(\text{s})}+{{\text{OH}}^{-}}_{(\text{aq})}$$##### (4)

$${{\text{HS}}^{-}}_{(\text{aq})}+2\hspace{0.17em}{\text{O}}_{2\hspace{0.17em}(\text{aq})}\to {{\text{SO}}_{4}^{2-}}_{(\text{aq})}+{{\text{H}}^{+}}_{(\text{aq})}$$_{2}S is absorbed in the biofilm (reaction 1), followed by its dissociation to HS

^{−}and hydrogen ion (H

^{+}) (reaction 2), which is dependent on pH. Then after, HS

^{−}is biologically oxidized to S

^{0}and/or SO

_{4}

^{2−}depending on the availability of oxygen (reactions 3 and 4). Under limited-oxygen conditions, reaction (3) proceeds in which the oxidation of HS

^{−}ends up with production of S

^{0}. In contrast, in the presence of sufficient amount of oxygen, the final product resulting from HS

^{−}oxidation is SO

_{4}

^{2−}(reaction 4). An O

_{2}/HS

^{−}molar ratio of 0.7 would favor the production of S

^{0}as the dominant final product, while an O

_{2}/HS

^{−}molar ratio of greater than 1.0 is required to obtain a significant conversion of HS

^{−}to SO

_{4}

^{2−}[20–22]. It should be mentioned that when HS

^{−}is limited instead of oxygen, and S

^{0}is present, SO

_{4}

^{2−}is formed according to reaction (5) [10].

##### (5)

$${{\text{S}}^{0}}_{(\text{s})}+1.5\hspace{0.17em}{\text{O}}_{2\hspace{0.17em}(\text{aq})}+{\text{H}}_{2}{\text{O}}_{(1)}\to {{\text{SO}}_{4}^{2-}}_{(\text{aq})}+2\hspace{0.17em}{{\text{H}}^{+}}_{(\text{aq})}$$^{−}oxidation can be represented by reactions 6 and 7 [10], in which the production of S

^{0}and/or SO

_{4}

^{2−}depends on the NO

_{3}

^{−}/HS

^{−}ratio. Note that reactions 6 and 7 are based on complete denitrification (reduction of NO

_{3}

^{−}to N

_{2}without nitrite (NO

_{2}

^{−}) accumulation).

##### (6)

$${{\text{HS}}^{-}}_{(\text{aq})}+\frac{1}{3}{\text{NO}}_{3\hspace{0.17em}(\text{aq})}^{-}\to {{\text{S}}^{0}}_{(\text{s})}+\frac{1}{6}{\text{N}}_{2\hspace{0.17em}(\text{g})}+{{\text{OH}}^{-}}_{(\text{aq})}$$##### (7)

$${{\text{HS}}^{-}}_{(\text{aq})}+\frac{4}{3}{\text{NO}}_{3\hspace{0.17em}(\text{aq})}^{-}\to {{\text{SO}}_{4}^{2-}}_{(\text{aq})}+\frac{2}{3}{\text{N}}_{2\hspace{0.17em}(\text{g})}+{{\text{H}}^{+}}_{(\text{aq})}$$_{3}

^{−}/HS

^{−}molar ratio is 0.33, HS

^{−}can be oxidized mainly to S

^{0}, while HS

^{−}oxidation to SO

_{4}

^{2−}requires an NO

_{3}

^{−}/HS

^{−}molar ratio of 1.33. Similar NO

_{3}

^{−}/HS

^{−}ratio were reported by some researchers who achieved the complete oxidation of HS

^{−}to SO

_{4}

^{2−}with an NO

_{3}

^{−}/HS

^{−}molar ratio between 1.2 and 1.6 whereas HS

^{−}oxidation ended up with S

^{0}at an NO

_{3}

^{−}/HS

^{−}molar ratio of less than 0.4 [23–26].

^{−}oxidation are SO

_{4}

^{2−}and S

^{0}. If S

^{0}is the dominant end-product, the major drawback is the accumulation of S

^{0}within the BF media, which causes clogging and increases the pressure drop across the BF column [27]. In case where the complete oxidation of HS

^{−}occurs, the BF medium pH would be expected to lower into acidic zone due to the formation of SO

_{4}

^{2−}and H

^{+}. However, several studies have shown that some groups of SOB could resist and maintain their activity under acidic environment. For instance,

*Acidithiobacillus thiooxidans*showed high activity for degradation of H

_{2}S at a pH of 1.5–3.0 [28–30].

*Thiobacillus thiooxidans*could tolerate a pH swing between 2.0 and 0.5, and

*Acidithiobacillus thiooxidans*AZ11 could grow at a pH as low as 0.2 for H

_{2}S oxidation [30].

### 2. Artificial Neural Network (ANN)

##### (8)

$$\begin{array}{c}{\text{W}}^{(\text{r}+1)}={\text{W}}^{(\text{r})}-\mathrm{\alpha}.\frac{\partial \text{E}}{\partial \text{W}}+\mathrm{\beta}.({\text{W}}^{(\text{r})}-{\text{W}}^{(\text{r}-1)}),\\ \text{W}={\overline{\text{W}}}_{\text{k}\times 1},{\overline{\overline{\text{W}}}}_{1\times \text{j}}\end{array}$$##### (9)

$$\begin{array}{l}\frac{\partial \text{E}}{\partial {\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}}}=-{[f({\text{X}}_{\text{i}\times \text{k}}.{\overline{\text{W}}}_{\text{k}\times \text{l}})]}^{\text{T}}.(\left({\text{Y}}_{\text{i}\times \text{j}}-{\widehat{\text{Y}}}_{\text{i}\times \text{j}}\right)\\ \odot {g}^{\prime}\left(f({\text{X}}_{\text{i}\times \text{k}},{\overline{\text{W}}}_{\text{k}\times \text{l}}).{\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}})\right)\end{array}$$##### (10)

$$\begin{array}{l}\frac{\partial \text{E}}{\partial {\overline{\text{W}}}_{\text{k}\times \text{l}}}=-{[{\text{X}}_{\text{i}\times \text{k}}]}^{\text{T}}.[((\left({\text{Y}}_{\text{i}\times \text{j}}-{\widehat{\text{Y}}}_{\text{i}\times \text{j}}\right)\\ \odot {g}^{\prime}\left(f({\text{X}}_{\text{i}\times \text{k}}.{\overline{\text{W}}}_{\text{k}\times \text{l}}).{\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}}\right)).{[{\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}}]}^{\text{T}})\\ \odot {f}^{\prime}({\text{X}}_{\text{i}\times \text{k}}.{\overline{\text{W}}}_{\text{k}\times \text{l}})]\end{array}$$##### (11)

$${\widehat{\text{Y}}}_{\text{i}\times \text{j}}=g(f({\text{X}}_{\text{i}\times \text{k}}.{\overline{\text{W}}}_{\text{k}\times \text{l}}).{\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}})$$##### (12)

$$\text{E}={\scriptstyle \frac{1}{2}}{({\text{Y}}_{\text{i}\times \text{j}}-{\widehat{\text{Y}}}_{\text{i}\times \text{j}})}^{2}={\scriptstyle \frac{1}{2}}({\text{Y}}_{\text{i}\times \text{j}}-{\widehat{\text{Y}}}_{\text{i}\times \text{j}})\odot ({\text{Y}}_{\text{i}\times \text{j}}-{\widehat{\text{Y}}}_{\text{i}\times \text{j}})$$- W̄

_{k×1}and ${\overline{\overline{\text{W}}}}_{\text{l}\times \text{j}}$ are the matrices assigned to the synaptic weights entering and leaving the hidden layer neurons, respectively- E: Network error function

- α: Convergence speed of the algorithm, more commonly known as learning rate

- β: Influence of (r-1)

^{th}iteration on the synaptic weights update at r^{th}iteration- The symbols “

*g*” and “*f*” denote the activation function of the neurons within the output layer and the hidden layer, respectively- X

_{i×k}: Network input matrix- Symbol “⊙” represents an element wise multiplication of two matrices

- Superscript “T” in [.]

^{T}refers to the transpose of matrix [.]- Y

_{i×j}and Ŷ_{i×j}are the matrices assigned to the desired output values and the output values computed using the network, respectively

### 3. Methodology

### 3.1. Data Acquisition

_{2}S. Salak fruit seeds (SFS) was used as the BF packing material, which was immobilized with SOB from the genus

*Thiobacillus*isolated from the sludge of the municipal wastewater treatment plant in Srandakan (Yogyakarta, Indonesia). The BF, with an inner diameter of 8 cm, had a total height and a packing height of 100 cm and 80 cm, respectively. A series of experiments were carried out to evaluate the performance of the BF, in term of H

_{2}S removal efficiency, as function of the axial distance from the BF inlet (0–80 cm), gas flow rate (8,550 to 23,940 g m

^{−3}h

^{−1}) and residence time (up to 4 h) (Table 1).

### 3.2. ANN Structure

_{2}S removal from a gas stream. The ANN model was composed of three layers: input layer, hidden layer and output layer. The input layer contained three neurons, one each for gas flow rate (x

_{1}), residence time (x

_{2}) and axial position in the BF bed (distance from the BF inlet) (x

_{3}). The output layer generating the model output had one neuron that was H

_{2}S removal efficiency (y). In the hidden layer, the optimal number of neurons was determined via trial and error method. The hyperbolic tangent function, one of the most widely used activation function in ANN models as it allows the model to learn non-linear relationships, was used for the neurons within the hidden and the output layers. The mathematical definition of the hyperbolic tangent function is given by equation (13).

##### (13)

$$\mathrm{\phi}\hspace{0.17em}({\text{S}}_{\text{in}})=\frac{2}{1+\text{exp}(-2{\text{S}}_{\text{in}})}-1\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}0\le \mathrm{\phi}\hspace{0.17em}({\text{S}}_{\text{in}})\le 1$$_{in}” denotes the sum of the weighted inputs entering a neuron, and φ(S

_{in}) represents the output of that neuron.

### 3.3. ANN Modelling Process

##### (14)

$${\text{x}}_{\text{i},\text{N}}=\frac{{\text{x}}_{\text{i}}-{\text{x}}_{\text{i},\text{min}}}{{\text{x}}_{\text{i},\text{max}}-{\text{x}}_{\text{i},\text{min}}}$$_{i,N}” represents the normalized value of “x

_{i}”, which is the actual value of input variable “i”. “x

_{i,min}” and “x

_{i,max}” are the minimum and maximum values of input variable “i”, respectively.

Training dataset: It was used to identify the optimal number of hidden layer neurons (HLNs) and to adjust the model synaptic weights to minimize the error function. The training dataset was further divided into “K” subsets in order to use the cross validation technique to determine the optimal number of iterations (epochs) at which the model training should be stopped.

Testing dataset: It was served to assess the accuracy and predictive capability of the model after training phase was complete.

#### 3.3.1. ANN model training

^{2})-that is a goodness-of-fit measure for the model-defined by equations (15) and (16), respectively. The closer R

^{2}value to unity, and the smaller RMSE value (closer to zero), the better the model fits the data. In other words, the model perfectly fits the data when R

^{2}value is equal to 1.0 and RMSE value is equal to 0.

##### (15)

$$\text{RMSE}=\sqrt{{\scriptstyle \frac{1}{\text{s}\times \text{p}}}\sum _{\text{i}=1}^{\text{s}}\sum _{\text{j}=1}^{\text{p}}{(\text{a}(\text{i},\text{j})-\widehat{\text{a}}(\text{i},\text{j}))}^{2}}$$##### (16)

$${\text{R}}^{2}=1-\frac{{\sum}_{\text{i}=1}^{\text{s}}{\sum}_{\text{j}=1}^{\text{p}}{(\text{a}(\text{i},\text{j})-\widehat{\text{a}}(\text{i},\text{j}))}^{2}}{{\sum}_{\text{i}=1}^{\text{s}}{\sum}_{\text{j}=1}^{\text{p}}{(\text{a}(\text{i},\text{j})-\overline{\text{a}})}^{2}}$$_{s×p}assigned to the desired output values; â(i,j) represents the elements of matrix Ŷ

_{s×p}assigned to the output values computed using the network; ā is the mean value of a(i,j); parameters “s” and “p” stand for the dataset size and the number of output variables, respectively.

The training dataset is partitioned into “K” disjoint folds where each fold contains the same number of samples.

“K” runs are performed such that within each run, the model is trained on (K-1) folds.

The trained model is then evaluated on the remaining fold (termed as validation fold) to estimate its RMSE.

The average of RMSE on the validation folds is plotted against the number of epochs.

#### 3.3.2. ANN model testing

### 4. Results and Discussion

_{2}S removal efficiency. The authors developed a mathematical model as a predictive tool for the assessment of changes in H

_{2}S removal efficiency with respect to the gas flow rate, axial distance from the BF inlet, and residence time (experimental observations represented in Table 1). The mathematical model, took into account mass transfer of H

_{2}S from the gas stream into the biofilm and biological oxidation of H

_{2}S in the biofilm, consisted of a set of ordinary differential equations (ODEs). The solution was obtained by Runge-Kutta method. The authors found a good agreement between the experimental data and the model-predicted values, with R

^{2}value of 86.56% [12].

_{2}S removal efficiency. Seventy five percent of the original (parent) dataset (45 input-output data pairs) devoted to the training process in order to build the model, and the remaining 25% of the parent dataset (15 input-output data pairs) was applied to evaluate generalization capability of the trained model. Two statistical error measures such as R

^{2}and RMSE were used to assess the model performance. In addition, the predictive capability of the proposed ANN model was compared with that of the mathematical model introduced by Lestari et al. [12].

Each ANN structure was initially fed with the normalized training dataset, and the synaptic weights were randomly assigned small values between zero and one.

The GDBP learning algorithm was utilized during which the parameters α and β were adjustable. The best values for α and β were determined by changing their values from 0.01 to 1.0 (0.01, 0.1, 0.2 … 0.9, and 1.0).

The training phase continued until RMSE (or its corresponding R

^{2}) value remained constant, otherwise the training was stopped at 10000 epochs.The ANN performance was evaluated in terms of RMSE and R

^{2}given by Eqs. (15) and (16), respectively. The higher R^{2}(or lower RMSE), the better the ANN fits the data.

^{2}curves versus the number of HLNs. From Fig. 3, as the number of HLNs increased, there was an increase in R

^{2}value and a decrease in RMSE value attaining R

^{2}and RMSE values of 99.24% and 0.028, respectively, with 9 HLNs. Further increase in the number of HLNs, from 9 to 12, did not result in a significant enhancement in the ANN performance. Hence, the optimal structure of the ANN was found to be 3-9-1 (3 neurons in the input layer, 9 HLNs and 1 neuron in the output layer) as illustrated in Fig. 4.

_{2}S removal efficiency with respect to the training and testing datasets (Fig. 6(a) and Fig. 6(b)). In Fig. 6 (a) and Fig. 6(b), the solid line is the best-fit line indicating 93.45% and 95.11% correlation between the measured and predicted values for the training and testing phases, respectively. These results imply that the proposed ANN model was successfully trained with a good predictive performance.

^{2}= 93.8%) was superior compared to Lestari’s mathematical model [12], which showed R

^{2}value of 86.6%. The higher accuracy of the ANN model, could be attributed to the fact that it was based only on the measured set of the input and output variables, without making any assumption about the interrelationship between the variables, whereas the mathematical model of Lestari et al. [12] was developed based on two fundamental simplifying assumptions; (1) pseudo-steady state flow condition for the gas phase, and (2) the uniformity of H

_{2}S concentration in the biofilm at a given axial position from the BF inlet. As mentioned earlier, simplifying mathematical models may result in an underestimation of the model output [12].

### 5. Conclusions

^{2}) value of 93.83% and 86.56%, respectively. This implies that the ANN model could be an attractive and useful tool that is worth considering for predicting the desulfurizing BFs performance as it can be effectively implemented, without requiring prior information about H

_{2}S biodegradation kinetics and mechanism.