화학공학소재연구정보센터
Industrial & Engineering Chemistry Research, Vol.59, No.8, 3446-3457, 2020
Consistency-Enhanced Evolution for Variable Selection Can Identify Key Chemical Information from Spectroscopic Data
In the last few decades, spectroscopic techniques such as near-infrared (NIR) spectroscopy have gained wide applications in several industries, such as the pharmaceutical, agricultural, oil, and gas industries. As a result, various soft sensors have been developed to predict sample properties from spectroscopic readings. Because the spectroscopic readings at different wavelengths, especially at the adjacent wavelengths, are highly correlated, it has been shown that variable selection could significantly improve a soft sensor's prediction performance while reducing the model complexity. To improve the prediction performance, most variable selection methods focus on identifying the variables (i.e., wavelengths or wavelength segments) that are strongly correlated with the dependent variable. Although many successful applications have been reported, these variable selection methods do have their limitations. Specifically, the selected wavelengths sometimes show little connection to the chemical bounds or functional groups presenting in the sample. In addition, the selected variables can be quite sensitive to the choice of the training samples. In this work, we address these limitations from a different perspective: if a variable selection algorithm can identify the truly relevant input variables, it should consistently identify the same subset of variables regardless of the choice of the training samples. Therefore, we propose a variable selection method that aims to improve the consistency of variable selection resulting from different training samples. The new algorithm is termed consistency-enhanced evolution for variable selection (CEEVS). To demonstrate the performance and robustness of CEEVS, we compare the proposed method with three representative variable selection methods using five published NIR data sets. These case studies clearly demonstrate that by improving the variable selection consistency, we can not only achieve improved prediction performance, but also identify key chemical information from spectroscopic data.