论文部分内容阅读
奇异样本识别是建立稳健模型的基础,但大样本数据集中奇异样本的识别非常困难.基于样本在蒙特卡洛交叉验证中的统计规律提出了一种奇异样本的识别方法,即首先利用蒙特卡洛交叉验证建立一定数量的模型,然后按照预测误差平方和(PRESS)排序并统计每个样本在不同模型中的出现频次.由于奇异样本的特殊性,其出现频次将与正常样本具有显著差异.通过对4组数据进行考察,结果表明:此方法可以有效地识别近红外光谱中的奇异样本,比常用的留一法交叉验证(LOOCV)方法具有更强和更准确的识别能力.
Singularity sample identification is the basis for establishing a robust model, but the identification of singular samples in large sample data sets is very difficult.Based on the statistical regularity of samples in Monte Carlo cross-validation, a method of singularity sample identification is proposed, Cross-validation to establish a certain number of models, and then ranked according to prediction error squared (PRESS) and statistics of each sample in different models of frequency of occurrence due to the particularity of singular samples, the frequency of occurrence will be significantly different from normal samples through The results of four groups of data show that this method can effectively identify strange samples in the near infrared spectrum and has stronger and more accurate identification ability than the commonly used LOOCV method.