Using boosting tree to learn imbalanced data

来源 :中国邮电高校学报(英文版) | 被引量 : 0次 | 上传用户:qisucha
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
In case of machine learning,the problem of class imbalance is always troubling,i.e.one class of the samples has a larger magnitude than the other classes.This problem brings a preference of the classifier to the majority class,which leads to worse performance of the classifier on the minority class.We proposed an improved boosting tree(BT) algorithm for learning imbalanced data,called cost BT.In each iteration of the cost BT,only the weights of the misclassified minority class samples are increased.Meanwhile,the error rate in the weight formula of the base classifier is replaced by 1 minus F-measure.In this study,the performance of the cost BT algorithm is compared with other known methods on 9 public data sets.The compared methods include the decision tree and random forest algorithm,and both of them were combined with the sampling techniques such as synthetic minority oversampling technique (SMOTE),Borderline-SMOTE,adaptive synthetic sampling approach (ADASYN) and one sided selection.The cost BT algorithm performed better than the other compared methods in F-measure,Gmean and area under curve (AUC).In 6 of the 9 data sets,the cost BT algorithm has a superior performance to the other published methods.It can promote the prediction performance of the base classifiers by increasing the proportion of the minority class in the whole samples with only increasing the weights of the misclassified minority class samples in each iteration of the BT.In addition,computing the weights of the base classifiers with F-measure is helpful to the ensemble decisions.
其他文献
In the traditional method,the software quality is measured by various metrics of the software,such as decoupling level (DL),which can be used to predict software defect.However,DL,which treats all the files equally,has not taken file importance into consi
As a special type of distributed denial of service (DDoS) attacks,the low-rate DDoS (LDDoS) attacks have characteristics of low average rate and strong concealment,thus,it is hard to detect such attacks by traditional approaches.Through signal analysis,a
Lithium-ion batteries are the main power supply equipment in many fields due to their advantages of no memory,high energy density,long cycle life and no pollution to the environment.Accurate prediction for the remaining useful life (RUL) of lithium-ion ba
To solve the problem of security and efficiency of anonymous authentication in the vehicle Ad-hoc network (VANET),a conditional privacy protection authentication scheme for vehicular networks is proposed based on bilinear pairings.In this scheme,the tampe
While solving unimodal function problems,conventional meta-heuristic algorithms often suffer from low accuracy and slow convergence.Therefore,in this paper,a novel meta-heuristic optimization algorithm,named protonelectron swarm (PES),is proposed based on
Network traffic classification,which matches network traffic for a specific class of different granularities,plays a vital role in the domain of network administration and cyber security.With the rapid development of network communication techniques,more
The bionics-based swarm intelligence optimization algorithm is a typical natural heuristic algorithm whose goal is to find the global optimal solution of the optimization problem.It simulates the group behavior of various animals and uses the information
A hybrid model for broadband multiple-input multiple-output (MIMO) relay-aided indoor power line communications (PLC) system was proposed in this paper.The proposed model combines the top-down and bottom-up approaches and extends to a two-hop relay-aided
In order to improve the learning speed and reduce computational complexity of twin support vector hypersphere (TSVH),this paper presents a smoothed twin support vector hypersphere (STSVH) based on the smoothing technique.STSVH can generate two hypersphere
In this paper,a power allocation to maximize tradeoff between spectrum efficiency (SE) and energy efficiency (EE) is considered for the downlink non-orthogonal multiple access (NOMA) system with arbitrarily clusters and arbitrarily users,where the subcarr