论文部分内容阅读
属性约简是粗糙集理论重要研究内容之一.Pawlak粗糙集约简的对象一般是有监督数据或者是无监督数据.而在很多现实问题中有标记数据很有限,更多的是无标记数据,即半监督数据.仅利用有标记数据一般难以计算出质量较好的属性约简.为此,基于粗糙集理论,结合集成学习与半监督学习,提出有效地利用无标记数据计算半监督数据属性约简算法.该算法在有标记数据上构造一组差异性较大的属性约简构造集成基分类器,在半监督自训练学习过程中,用集成分类器对无标记数据做出预测,扩大有标记数据集,从而获得质量更好的约简.UCI数据集实验分析表明该算法是有效可行的.
Attribute reduction is one of the important research contents of rough set theory.Pawlak rough set reduction object is generally supervised data or unsupervised data.And in many real problems there is a limited number of tag data, more is the untagged data, That is, semi-supervised data.It is generally difficult to calculate attribute reduction with good quality only by using labeled data.Therefore, based on rough set theory, combined with integrated learning and semi-supervised learning, it is proposed that the use of unlabeled data to calculate semi-supervised data attributes Algorithm is proposed in this paper.The algorithm constructs a set of attribute-reduction integrated set-based classifiers with marked differences on the marked data, and predicts the unmarked data with the integrated classifier in the semi-supervised self-learning process, The experimental results show that the proposed algorithm is feasible and effective.