Gaussian process regression with heteroscedastic noises — A machine-learning predictive variance approach

https://doi.org/10.1016/j.cherd.2020.02.033Get rights and content

Highlights

  • Proposed a novel machine learning variance prediction method to solve the heteroskedastic GPR problem.

  • A regression model based on SVR and ELM method is proposed for both noise variance prediction and smoothing.

  • The proposed method not only expands the use of W-GPR but also improves the prediction performance of heteroscedastic GPR models.

Abstract

Gaussian process regression (GPR) is one of the most important data analytic tools in modelling processes. It has attracted increasing interest in chemical engineering applications due to its superior performance in dealing with complex modelling problems such as high-dimensional and nonlinear data. However, traditional GPR has the main limitation in that it considers an independent identically distributed (i.i.d.) noise at every sample point. Modern chemical processes typically have a more complex data structure and noise properties. The assumption of i.i.d. noise is not realistic. Thus, there is a growing interest in solving a heteroscedastic noise problem that does not satisfy the i.i.d. condition. The most common heteroscedastic noise is the noise with varying variance. This paper proposes a novel machine learning variance prediction method to solve the heteroskedastic GPR problem. By considering not only the input-dependent noise variance but also the input-output-dependent noise variance, a regression model based on support vector regression (SVR) and extreme learning machine (ELM) method is proposed for both noise variance prediction and smoothing. Compared with the existing weighted Gaussian process regression (W-GPR) of the literature, the proposed method not only expands the use of W-GPR but also improves the prediction performance of heteroscedastic GPR models. Finally, the proposed algorithm is verified by two numerical examples and tested in a real polyester polymerization process. The results all demonstrate the effectiveness of the proposed approach.

Introduction

In order to realize and accelerate the pace of intelligent manufacturing of industrial processes, a large number of machine learning algorithms have been widely explored for process monitoring, product quality prediction, and fault diagnosis (Ge, 2017; Ge et al., 2017). In particular, multivariate statistical techniques such as principal component regression (PCR), Gaussian process regression (GPR), partial least squares (PLS), and support vector regression (SVR) have been extensively applied on industrial data (Qin, 2014). Recent developments in machine learning, artificial intelligence and data mining technology provide a new era for using process data to solve large scale problems including process monitoring, fault diagnosis, soft sensing, etc.

Gaussian processes (GPs) provide a probabilistic framework for learning kernel machines as well as for a model selection. Gaussian process regression (GPR) has become a practical and reliable Bayesian method in the field of machine learning (Rasmussen and Williams, 2006). Since Rasmussen (1996) proved that the predictions obtained with the Gaussian process are as good as or better than other state-of-the-art predictors, GPR has been successfully applied in time series forecast (Meer et al., 2018), dynamic system model identification (Ni et al., 2012), control system design (Likar and Kocijan, 2007), and combinations with Bayesian filtering (Likar and Kocijan, 2007; Lundgren et al., 2016). In fact, there are many other algorithms combined with GPR to solve the problem of forecasting certain key variables in the real-world (Avila et al., 2018; Chen and Ren, 2009; Elforjani and Shanbr, 2018; Fang and Chiang, 2017; Meer et al., 2018; Xiong et al., 2017; Zhang et al., 2016). In addition to these applications mentioned above, GP has also achieved successful applications in chemical processes (Liu and Gao, 2015; Yu, 2012). For example, an online soft sensor model based on finite mixture model and GPR model was proposed to establish a quality attributes prediction of complex chemical processes (Yu, 2012). Yu et al. (2013) proposed a Bayesian model averaging based multi-kernel GPR for state estimation and quality prediction of Nylon-6,6 batch polymerization processes with multiple operating phases and between-phase transient dynamics.

However, the collected industrial data is never perfect and can be corrupted by the outliers and other disturbances (noise) (Zhu and Wu, 2004). Especially in the process industry, data is obtained by sensors distributed in the production line. Different types of sensors, different sampling frequencies and signal transmissions bring disturbances to the data. Noisy data can introduce confusion in data interpretation. Different types of noise or disturbance can have different effects on modeling and decision making. Hence, different noise should be treated differently in order to improve the generalization performance of the model created from the data (Zhu and Wu, 2004). Similarly, the model based on GPs should relax the type of noise hypothesis since an appropriate noise hypothesis can improve the performance of the model (Zhi-Kun et al., 2013). Traditional GPR model considers the noise variance to be identical at every sample point. However, in many practical problems especially in the chemical industry, simply assuming that the noise at each sampling point is an identically distributed Gaussian noise is not always effective for improving the prediction accuracy (Ranjan et al., 2016). For example, the measurement noise variance may change with the varying magnitude of the measured signal. In the presence of more complicated noise structures, the performance of GPs model with regular Gaussian noise assumption will be weakened. Hence, it is necessary to relax the assumption of identically distributed Gaussian noise made in the standard GP model (Zhi-Kun et al., 2013). It is natural to generalize the noise model from homoscedastic to heteroscedastic. For different sample points, the variance of the noise is no longer a fixed constant. When the variance of noise is no longer the same, we consider noise to be heteroscedastic. Goldberg et al. (1997) first considered input-dependent noise in the heteroscedastic GPR and used an expensive Markov chain Monte Carlo sampling of the posterior distribution of the noise rate to solve the problem. Subsequently, in order to perform maximum a posterior estimation, Le and Smola (2005) and Kersting et al. (2007) further studied a faster but more limited method to solve the heteroscedastic noise based on Goldberg's method. The main disadvantage of Le and Smola (2005) and Kersting et al. (2007) is that they require lots of computation resources. With the development of GP models for robust regression, the noise with different models is considered, such as, mixture noise models (Hong et al., 2017; Kodamana et al., 2018; Kuss, 2006), Laplace noise (Hartmann and Vanhatalo, 2018; Kuss, 2006), Student-t noise (Hartmann and Vanhatalo, 2018; Jylänki et al., 2011; Kuss, 2006; Ranjan et al., 2016), and time series noise (Hong et al., 2018b; Murray-Smith and Girard, 2001). In a recent work, Hong et al. (2018a) introduced a weighting strategy into the standard GPR algorithm in dealing with Gaussian heteroscedastic GP, and proposed three weighted GPR algorithms. In contrast to the standard GPR algorithm, the three weighted algorithms calculate the noise variance at each sampling point by weighting the sampled data.

However, in order to solve heteroscedastic GPs, whether to use the expensive Markov chain Monte Carlo method to approximate the posterior noise variance or employ the repeated sampling method to estimate the variance is an open question. Assuming a chemical process of heteroskedastic Gaussian process, not all sample points can be repeatedly sampled, e.g. there are several samples that may not be repeatedly sampled. Thus, in such cases, a weighted GPR model will be difficult to obtain, or the accuracy of the established GPR model will be limited. Obviously, the repeated sampling method mentioned above will need further extension. In light of the above problems, this paper proposes a new method on the basis of Hong et al. (2018a) to solve the problem of heteroskedastic noise in Gaussian process regression. Different from Hong et al. (2018a), we consider random samples for each sample point in the process of repeated sampling. Its advantage is that it not only ensures that all sample points are allowed to have different numbers of observations but also guarantees generalization performance. A machine learning variance prediction model (VP) based on SVR and ELM is used for both noise variance prediction when the variance cannot be calculated and for smoothing the estimated variance for Gaussian heteroscedastic GPR (VP-GPR) when the sample variances can be calculated. In addition, inspired by Edmonds (2009) and Frénay and Verleysen (2014), we not only consider the input-dependent noise but also the output-dependent noise.

The remainder of this paper is organized as follows: in Section 2, we revisit the mathematical formulation for the standard GPR, SVR and ELM algorithm briefly. In Section 3 we give detailed derivations for the proposed VP-GPR model for the heteroscedastic noise and two different noise models are also discussed. Then Section 4 introduces the use of the PSO algorithm to search the hyper-parameters of the VP-GPR model.

Section snippets

Gaussian process regression

For any set S, a GP on S is a set of random variables fx,xS, i.e., for any n and x1,x2,,xnS, f(x1),,f(xn) is multivariate Gaussian. As a Gaussian distribution is specified by a mean vector and a covariance matrix, a GP is also fully determined by a mean function and a covariance function or kernel function (Rasmussen and Williams, 2006). If mean function m(x) and covariance function k(x,x') are known, the GP can be denoted as (Rasmussen and Williams, 2006):f(x)GP(m(x),k(x,x'))wherem(x)=Ef

Variance-prediction-based heteroscedastic GPR

In general, in most GPR application models, the noise model is considered to be a homoscedastic Gaussian noise. Hence, homoscedastic noise model can be done easily to make inference and improve the speed of estimating the parameters. However, this assumption of constant variance throughout all observations can fail or provide poor estimation in applications to actual industrial processes when such a condition about the noise is not satisfied. To deal with this problem, a weighted

Parameter estimation for VP-GPR

Fig. 2 provides the flow chart of the proposed modeling method for heteroscedastic GPR. The method mainly consists of two parts: One is to predict the noise variance to find the posterior distribution of the test sample, and the other is to learn hyper-parameters. This section will discuss the estimation of hyper-parameters for VP-GPR.

For VP-HGPR, the value of σxi2 and σxi,yi2 has been estimated as shown in Section III. Hence, the remaining task is to estimate the hyper-parameters Θ=σf2,l12,,ld

Simulation, application and discussion

In this section, to validate the feasibility and efficiency of the proposed method, two numerical simulations along with a polyester polymerization processes example are studied. A leave-one-out cross validation is considered for model assessment and the root mean square error (RMSE), normalized mean squared error (NMSE) and the normalized mean absolute error (NMAE) are used as the main performance metrics (Hong et al., 2018a), which are defined asRMSE=i=1N(yˆiy˜i)2/NNMAE=i=1Nyˆiy˜i/i=1Nyˆi

Conclusion

In this paper, the GPR model with heteroscedastic noise is considered. A new method based on variance prediction GP regression is proposed. As a result, the estimated variance is more relevant when used in building the GPR model. Considering that the noise variance can be both input and output dependent, both input-dependent noise and input-output-dependent noise are considered in the proposed GPR. In addition, the performance of the VP-GPR algorithm and that of W-GPR algorithm in both noise

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The first four authors would like to acknowledge the support in part by the National Key Research and Development Plan from Ministry of Science and Technology (2016YFB0302701), National Natural Science Foundation of China (nos. 61603090, 61903078), Natural Science Foundation of Shanghai (19ZR1402300), the Fundamental Research Funds for the Central Universities (no. 2232017D-13) and the Fundamental Research for the Central Universities and Graduate Student Innovation Fund of Donghua University

References (42)

  • B. Likar et al.

    Predictive control of a gas–liquid separation plant based on a Gaussian process model

    Comput. Chem. Eng.

    (2007)
  • Y. Liu et al.

    Industrial melt index prediction with the ensemble anti‐outlier just-in-time Gaussian process regression modeling method

    J. Appl. Polym. Sci.

    (2015)
  • R. Ranjan et al.

    Robust Gaussian process modeling using EM algorithm

    J. Process Control

    (2016)
  • W. Xiong et al.

    Adaptive soft sensor based on time difference Gaussian process regression with local time-delay reconstruction

    Chem. Eng. Res. Des.

    (2017)
  • J. Yu

    Online quality prediction of nonlinear and non-Gaussian chemical processes with shifting dynamics using finite mixture model based Gaussian process regression approach

    Chem. Eng. Sci.

    (2012)
  • J. Yu et al.

    A Bayesian model averaging based multi-kernel Gaussian process regression framework for nonlinear state estimation and quality prediction of multiphase batch processes with transient dynamics and uncertainty

    Chem. Eng. Sci.

    (2013)
  • C. Zhang et al.

    A Gaussian process regression based hybrid approach for short-term wind speed prediction

    Energy Convers. Manage.

    (2016)
  • B. Edmonds

    The nature of noise

    Lect. Notes Comput. Sci.

    (2009)
  • M. Elforjani et al.

    Prognosis of bearing acoustic emission signals using supervised machine learning

    IEEE Trans. Ind. Electron.

    (2018)
  • S. Fang et al.

    A high-accuracy wind power forecasting model

    IEEE Trans. Power Syst.

    (2017)
  • B. Frénay et al.

    Classification in the presence of label noise: a survey

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • Cited by (16)

    • Domain adaptation network with uncertainty modeling and its application to the online energy consumption prediction of ethylene distillation processes

      2021, Applied Energy
      Citation Excerpt :

      The development of soft sensors has been greatly facilitated by the achievements in data science, computing and communication technologies, statistical tools, and machine learning techniques [11]. Widely used soft sensing methods mainly include Partial Least Square Regression (PLSR) [12,13], Principal Component Regression (PCR) [14,15], Gaussian Process Regression (GPR) [16–18], Support Vector Regression (SVR) [19,20] and Neural Network (NN) [21–23]. The traditional soft sensing methods assume the training data and testing data lie in one single working condition, but in the actual industrial processes, it is difficult to meet the requirement because the working conditions change occasionally.

    • Machine learning-based predictive control using noisy data: evaluating performance and robustness via a large-scale process simulator

      2021, Chemical Engineering Research and Design
      Citation Excerpt :

      For example, it has been pointed out in Li et al. (2020) that assuming independent identically distributed (i.i.d.) noise is not realistic when modeling some chemical processes with Gaussian process (GP) regression. For that reason, the i.i.d. condition is relaxed to heteroscedastic noise for a machine learning structure in Li et al. (2020). In Hsu and Wang (2009), a Wiener-type recurrent neural network is tested with two types of noise, a white noise and a sinusoidal-type noise in order to evaluate robustness.

    View all citing articles on Scopus
    View full text