### 1. Introduction

^{3}working volume, 1.4 m height, 1.2 m liquid level) was operated to collect data for the water level prediction model. By collecting pressure data from seven sensors (six in the liquor and one in the headspace), density profiles of the liquid columns were derived, and the top layer’s density was calculated through prediction models. As a result, a cubic model outperformed other polynomial models as well as the traditional, two-sensors approach. Although the digestate level can be predicted with high accuracy using this method, however, the requirement for seven pressure sensors may cause investment and maintenance burdens. Because only polynomial models were tested in the previous study, there is likely room for improvement if we test other modeling approaches using the same or even less number of sensors.

### 2. Materials and Methods

### 2.1. Summary of the Research Procedures

### 2.2. System Description and Data Collection

^{3}working volume; 1.4 m height; 1.2 m liquid level) bioreactor was equipped with seven pressure sensors at approximately 0.1, 0.3, 0.4, 0.6, 0.7, 0.9, and 1.25 m from the bottom (Fig. 1). The bioreactor was operated for 175 days; the liquid level was maintained stable at 1.2 m for most of the time, while a short-term level variation (0.8–1.2 m) was applied at day 99. The bioreactor was placed in a temperature-controlled room (36°C). The anaerobic bioreactor showed a stable digestion performance in terms of pH, biogas production, and volatile fatty acid accumulation (see Rhee et al. [8] for details). All the sensor-derived data (i.e., pressure, temperature, and biogas) were recorded at an interval of one min or less.

*P*

*,*

_{0}*P*

*, …,*

_{1}*P*

*) were obtained from the pressure meters (*

_{n}*h*

*,*

_{0}*h*

*, …,*

_{1}*h*

*), of which*

_{n}*P*

*and*

_{0}*h*

*refer to the top sensor in the headspace. The apparent density of the liquid layers (*

_{0}*ρ*

*, …,*

_{2}*ρ*

*), except for the top layer (*

_{n}*ρ*

*; between*

_{1}*h*

*and*

_{1}*h*

*), was determined by gravimetric relationship (Eq. (1)). Once the top layer density (*

_{2}*ρ*

*) is estimated, the total liquid level (*

_{1}*h*

*) can be calculated by Eq. (2).*

_{liquid}*ρ*is the density of the liquid column,

*P*the pressure,

*g*the gravitational force, and

*h*the height. In Eq. 2,

*P*

*(headspace pressure) is subtracted from*

_{0}*P*

*as a reference value because the anaerobic bioreactor usually keeps positive pressure at the headspace. The previous study compared polynomial models (linear to quintic) between*

_{1}*P*and

*ρ*, and concluded that the top layer (

*ρ*

*) is well depicted by a cubic model [8].*

_{1}### 2.3. Modeling

#### 2.3.1. Multiple linear regression (MLR)

#### 2.3.2. Artificial neural network (ANN)

#### 2.3.3. Random forest (RF)

#### 2.3.4. Support vector machine (SVM)

### 2.4. Model Evaluation and Variable Significance Test

*P*

*, …,*

_{0}*P*

*; from top to bottom) and the temperature data (*

_{6}*T*) were considered to be used in modeling to compare directly with the result of the previous study. Out of the seven pressure readings, three combinations were specifically tested for the models: two (bottom and the headspace readings;

*P*

*and*

_{6}*P*

*), three (bottom, top, and the headspace readings;*

_{0}*P*

*,*

_{6}*P*

*and*

_{1,}*P*

*), and seven (all seven readings). The data was preprocessed through standardization for accurate modeling. Standardization is preprocessing method which rescales data to have one as standard deviation and zero as mean. After standardization, 747 data points were derived; 521 data points (70% of data) were used to train models and 225 data points (remaining 30% of data) were used to test the models. This study conducted model evaluation by comparing RMSE, mean APE, and maximum APE.*

_{0}##### (3)

$$RMSE=\sqrt{{\mathrm{\Sigma}}_{i=1}^{n}{\scriptstyle \frac{{({Value}_{i(estimated)}-{Value}_{i(measured)})}^{2}}{n}}}$$##### (4)

$$APE=\left|{\scriptstyle \frac{{Value}_{i(estimated)}-{Value}_{i(measured)}}{{Value}_{i(measured)}}}\right|$$*t*-statistic for each variable was used. The RF used an out-of-bag score for variable importance calculation. Gevrey’s weights method, which combines the absolute value of the weights, was used to estimate variable importance for ANN [26, 27]. Finally, locally estimated scatterplot smoothing (LOESS) R2 was used for SVM models for variable importance. All variable importance was estimated using ‘varImp’ function of caret package in R.

##### (5)

$$AICc=1+\text{ln\hspace{0.17em}}\left({\scriptstyle \frac{SSE}{n}}\right)+{\scriptstyle \frac{2(p+1)}{n-p-2}}$$*SSE*is the sum of squares error,

*n*is the observed data’s number, and

*p*is the number of parameters. A model with a smaller AICc value was assumed to be more accurate because AICC is in proportion to the error.

### 3. Results and Discussion

### 3.1. Model Performance

### 3.2. Importance of Variables

*T*) shows low or no importance to the modeling results. This is reasonable because the temperature within a stable range can hardly affect the volume or density of a liquid. Among the pressure variables, the headspace reading (

*P*

*) and the bottom reading (*

_{0}*P*

*) showed high variable importance in most cases. The topwater column reading (*

_{6}*P*

*) was also significant in some cases, especially when using three pressure variables (Fig. 3b). One of the reasons contributed to this observation is that this pressure meter was at the headspace (in addition to*

_{1}*P*

*) in some data points with lower liquid levels. This reason may lead to another accuracy problem in the reactor. The other water column readings (*

_{0}*P*

*–*

_{2}*P*

*) generally had low importance on the model (Fig. 3).*

_{5}*P*

*–*

_{2}*P*

*were over 10,000, and only P*

_{6}_{1}and P

_{0}showed lower VIF values (< 100). This phenomenon can be induced by a dataset that contains highly correlated variables [32] and negatively affects the results. It was suggested that the effects of multicollinearity can be removed by removing redundant data or introducing prior information [33], which can be the case of SVM models with lower pressure variables in this study. For this reason, it is not recommended to utilize SVM with RBF kernel for liquid level prediction with multiple pressure sensors.

*P*

*showed significantly lower performance. It implies that the pressure meter at the headspace is essential to estimate the water level. This is reasonable because the headspace pressure is linked to all pressure values within the bioreactor. Removing some layers, such as*

_{0}*P*

*, lowered the errors, indicating that having more parameters does not necessarily improve the model. This is probably because*

_{1}*P*

*experienced both water (liquid) and air (headspace) phases depending on the liquid level. Therefore, avoiding a pressure sensor at an amphibious level could be suggested. Overall, the absence of a parameter with a higher contribution was not critical to the model’s performance, suggesting not as many as seven pressure parameters are required for accurate modeling.*

_{1}### 3.3. AICc Test

*P*

*,*

_{0}*P*

*,*

_{2}*P*

*, and*

_{3}*P*

*: one headspace meter and three liquid-facing meters excluding the top and the bottom ones (*

_{5}*P*

*and*

_{1}*P*

*). The same combination resulted in a significant decrease of AICc for the RF model, but comparable or even higher AICc’s for the other two models. These results imply that selecting the parameters is required to optimize the model output for liquid level estimation using the current method. To summarize, the pressure data were essential to building accurate models to estimate the liquid level, while the temperature showed little effect. Among the different levels, the pressure meter located in the headspace is crucial, and the number of sensors in the liquid can be optimized to increase the model accuracy.*

_{7}