Elsevier

Energy

Volume 55, 15 June 2013, Pages 319-329
Energy

A novel least squares support vector machine ensemble model for NOx emission prediction of a coal-fired boiler

https://doi.org/10.1016/j.energy.2013.02.062Get rights and content

Highlights

  • A novel LSSVM ensemble model to predict NOx emissions is presented.

  • LSSVM is used as the base learner and PLS is employed as the combiner.

  • The model is applied to process data from a 660 MW coal-fired boiler.

  • The generalization ability of the model is enhanced.

  • The time consuming in training and searching the parameters decreases sharply.

Abstract

Real operation data of power plants are inclined to be concentrated in some local areas because of the operators’ habits and control system design. In this paper, a novel least squares support vector machine (LSSVM)-based ensemble learning paradigm is proposed to predict NOx emission of a coal-fired boiler using real operation data. In view of the plant data characteristics, a soft fuzzy c-means cluster algorithm is proposed to decompose the original data and guarantee the diversity of individual learners. Subsequently the base LSSVM is trained in each individual subset to solve the subtask. Finally, partial least squares (PLS) is applied as the combination strategy to eliminate the collinear and redundant information of the base learners. Considering that the fuzzy membership also has an effect on the ensemble output, the membership degree is added as one of the variables of the combiner. The single LSSVM and other ensemble models using different decomposition and combination strategies are also established to make a comparison. The result shows that the new soft FCM-LSSVM-PLS ensemble method can predict NOx emission accurately. Besides, because of the divide and conquer frame, the total time consumed in the searching the parameters and training also decreases evidently.

Introduction

Coal is one of the primary fuel sources used in power plants because of its low price and relative abundance. However, the emission of the nitrogen oxides (NOx) during coal combustion has become an important issue of environment pollution and brought a significant influence on air quality [1]. NOx emission is considered responsible for photochemical smog, acid rain, and ozonosphere destroying risk [2]. With the rising requirement for environmental protection, controlling NOx emission from coal-fired boilers of power plants has been a worldwide concern and exhibits great challenges [3]. Combustion optimization is demonstrated to be one of the effective techniques for emission reduction in coal-fired boilers. Manipulated parameters of the boiler, such as primary and secondary air, oxygen, coal feeding and etc., can be set carefully using artificial intelligence techniques to realize the optimal combustion with low NOx emission [4]. However, the relation between NOx and these relevant operation parameters should be obtained correctly before performing the optimization scheme. That is to say, the model to predict NOx emission using other related variables should be established firstly.

In addition, there is another important reason why the model for NOx emission prediction is focused on. Continuous emission monitoring systems (CEMS) are commonly applied to measure NOx emission in the present power plants. However, CEMS are not only expensive to purchase but also relatively complex in installation. In addition, CEMS require much more maintenance because of operating in a very harsh environment with high electromagnetic interference and the systems have to be in an offline state during the maintenance period. Therefore, some redundancy techniques to realize the measurement of NOx emission are also much-needed [5]. A new soft sensing model, which is established to estimate NOx emission using other relevant parameters based on suitable algorithms such as artificial intelligence, may be feasible as a potential alternative and assistance to CEMS.

A number of conventional mechanism techniques for predicting NOx emission of coal-fired utility boilers based on first principles such as heat and mass balances have been studied. For example, Li et al. [6] developed the NOx emission model using the system fundamentals and parameters identification; Chui and Gao [7] applied computational fluid dynamics based combustion technology to estimate NOx emission; Belosevic et al. [8] studied the NOx formation based on the differential mathematical method. However, these models are usually complicated and time consuming to build because a complete fundamental knowledge of the combustion process should be well known and at times a large number of parameters are required [9]. In fact, the process of NOx formation always exhibits the characteristics of high dimension, nonlinearity and the correlation between boiler operational parameters, which presents much more difficulty to establish an accurate model for the prediction of NOx emission [10]. Moreover, the accuracy and the reliability of a mechanism model may decrease with time because of the degradation of the boiler equipment [11].

Alternatively, another approach of building the NOx emission model is to use the operation data of the power plant, which is called data-driven or black-box modeling. The relationship between NOx emission and other related process variables is expressed using the information contained in the operation data on the base of multivariate statistical or advanced intelligent techniques. In this way, the detailed knowledge about the NOx formation process is not necessary, which makes it appropriate to tackle such a modeling problem with complicated or even unknown mechanism. Many research works have been focused on black-box modeling on thermal process using artificial intelligence techniques such as artificial neural network (ANN) and support vector machine (SVM). Kalogirou [12] gave a brief review on the successful applications of different types of ANN in several energy systems. Besides, ANN was also employed to predict the NOx emission of a 600 MW coal-fired boiler on the basis of experimental data [13]. However, ANN is vulnerable to suffer from disadvantages such as overfitting and poor generalization ability, especially when the training data are insufficient and of multi-collinearity. SVM is a novel and attractive type of learning technique that is based on statistical learning theory with structure risk minimization principle [14]. Compared with other modeling methods, it has the advantages of simple structure, avoiding overfitting and high generalization capability. SVM has got an extensive application in thermal process and power energy systems [15]. Least squares support vector machine (LSSVM) proposed by Suykens is an important extension and development of standard SVM, which adopts equality constraints to replace inequality ones [16]. As a result, the solution can be found by solving a linear Karush–Kuhn–Tucker (KKT) system rather than tackling quadratic programming problem, which significantly reduces the computational complexity. Therefore, LSSVM is a good choice for developing a high performance model to predict NOx emission.

When applying data-driven techniques to develop a model, the data source is the first factor should be taken into account. Commonly there are mainly two separate sources for the training data set: historical operation data and experimental data. It is much easier to implement the training for a black-box model based on the experimental data set [17]. In our previous studies, variable selection method [18] and nonlinear partial least squares [19] were proposed to improve the prediction accuracy of NOx emission, where the data samples obtained from the hot state experiment were mainly used and aimed at. However, such an experiment needs to be well designed by changing the related parameters of the boiler like air valve openings to be of different values. Performing an experiment on a real plant boiler is not only time consuming but also results in a huge economic loss to the power station, and even brings a certain influence on the stability and security of the boiler. In addition, the number of data samples acquired from the experiment is always very limited. On the contrary, the majority of existing power plants have been equipped with distributed control system (DCS) to collect a real-time data repository of operating parameters. Therefore, it is much more convenient to develop a data-driven model based on the data acquired from DCS.

However, the real operation data tend to be concentrated in local regions due to the habits of the operators and the control system design, which makes it difficult to achieve a satisfactory prediction accuracy with one single model. Besides, in the formulation of LSSVM, the kernel and regularization parameters which influence the generalization performance should be well determined in advance. Unfortunately, there has not been a mature theoretical framework to calculate these parameters, and the popular approach is to apply grid search using cross-validation accuracy as the criterion [20]. In fact, searching the optimal parameters along with training the LSSVM is a problem with exponential time complexity [21]. Therefore, it is time consuming to develop a perfect NOx emission model using LSSVM, especially when dealing with larger number of operation data samples, which brings difficulty for on-line model establishment, updating and reconstruction.

In allusion to the problems mentioned above, a novel ensemble learning is employed to tackle the difficulty in NOx emission modeling based on historical operating data. The main conception of this methodology is to decompose the complex problem into multiple simple subtasks to handle based on the “divide and conquer” principle [22]. In general, there are mainly three steps to construct the framework of a hybrid ensemble system. First of all, the original training data are divided into a number of relatively small but meaningful subsets. Then the base learners are developed in the individual subspaces separately based on experience formulas or intelligent techniques, and in this paper LSSVM is employed. Finally, the outputs of the individuals are combined as the ultimate result using the aggregation strategy. In comparison to other ordinary learning techniques, the base learners accomplish the same task individually and then are aggregated in a combiner to achieve an improved performance. Based on this framework, some hybrid ensemble models of high performance have been constructed in the energy industry, such as nuclear energy consumption prediction [23], seasonal decomposition based ensemble for hydropower consumption forecasting [24] and neural network ensemble for steam load prediction [25]. However, the application of ensemble methods in the NOx emission prediction of a power plant boiler is relatively fresh and untouched.

There are two major issues needed to be considered in the design of an effective ensemble model to predict the NOx emission. The first is that the individual learners must exhibit much disagreement according to the bias-variance-complexity trade-off principle [26]. In light of the operation characteristics of the power plant boiler, fuzzy c-means (FCM) cluster algorithm is preferred to partition the local region distribution data into different subsets of the diversity. However, the traditional FCM cluster is based on the maximum membership, which means that each sample is assigned to only one cluster and any two clusters are non-overlapping. It is difficult to predict the adjacent boundary samples accurately because they may not completely belong to any certain cluster [27]. Therefore, a new soft FCM (SFCM) cluster is proposed by slightly modifying the traditional maximum membership principle, which makes the subsets overlapping instead of being disjoint.

The other core factor is to select an optimal ensemble strategy to reconcile a set of diverse learners. For the regression problem, the most commonly used strategy is weighting ensemble such as averaging and least squares estimation-based (LSE) weighted averaging [28]. In averaging, the base learners are weighted equally. While in LSE weighted method, base learners are endowed with different weights which are calculated by the least squares estimation [29]. However, for a relatively stable algorithm like LSSVM, the similar individual learners with a high correlation will be generated in spite of the differences of the subsets. Furthermore, this correlation may bring about a multicollinearity problem, thus leading to a deteriorated ensemble result [30]. Besides, Zhou et al. [31] pointed that it would produce an effective output if some of the base learners instead of all of them were selected to be aggregated. That is to say, ensemble of the information from all the base learners cannot guarantee a good result, on the contrary, it will probably lead to a worse system because of the noise and redundant information incorporated in the data. In view of that, partial least squares (PLS) algorithm is employed as the combination strategy to extract the diverse information and eliminate the correlation between individual learners. PLS has the advantages of coping with the multicollinearity and it is very applicable to this case where multiple base learners are highly correlated.

The main purpose of this paper is to construct an LSSVM-based ensemble learning system to predict NOx emission of a coal-fired boiler. The proposed ensemble paradigm includes several steps. Firstly, the original training data are partitioned into several overlapping subsets based on SFCM algorithm to maintain the diversity of individual learners. Subsequently LSSVM is employed to train the individual base learner and the corresponding result is obtained. Finally, PLS is applied as the aggregation strategy to eliminate redundant information and combine base learners with maximum diversity. Then the SFCM-LSSVM-PLS ensemble (SFLPE) algorithm is applied to establish NOx emission model based on real operation data. Comparisons with the ordinary single LSSVM model and other ensemble models are also made. The remainder of this paper is organized as follows. The next section describes the formulation of soft FCM-LSSVM-PLS ensemble (SFLPE) learning paradigm. The detailed application on NOx emission prediction and model comparisons is presented in Section 3. The conclusions are drawn in Section 4.

Section snippets

Data partition based on SFCM

Due to the characteristics of the real operation, the training data can be partitioned into different subsets to achieve the diversity of the base learners using the cluster algorithm. Fuzzy c-means (FCM) cluster, proposed by Dunn and Bezdek [32], is one of the most important and widely used clustering methods. Given a data set X=[x1,xN]T, and xi=[xi1,xip]T, i = 1,…,N, where N is the number of samples and p is variable dimension, FCM partitions the data set X into T clusters by minimizing the

Boiler description and data selection

The data for the present research were extracted from a database attached to a supercritical 660 MW tangentially fired once-through boiler. The boiler has a 19.08 × 19.08 m2 cross-section furnace and a height of 65.1 m, which belongs to one unit of NingDong power plant in Ningxia Region of China. The mixture of air and pulverized coal were blown into the center of the furnace, forming an imaginary tangent circle with a 7.69 m diameter, as illustrated in Fig. 3. Six layers of primary air nozzles and

Conclusions

Development of an accurate physical model for coal-fired boilers can be difficult due to complexity of the system. In this paper, we present a novel ensemble methodology with LSSVM as base learners to establish the NOx emission model for coal-fired power plants using real operation plant data. The processing and proper selection of training data is also discussed in detail. The experiment results indicate that the prediction accuracy is improved and the training time decreases significantly by

Acknowledgements

This project was funded by the National Basic Research Program of China (‘973’ Project) (No. 2012CB215203), National Natural Science Foundation of China (No. 51036002) and the Fundamental Research Funds for the Central Universities (12QX15). The authors would like to thank Engineer Cheng J. of NingDong power plant for his assistance to collect the operation data.

References (38)

Cited by (0)

View full text