Entropy Based Hybrid Sampling Model

Authors

  • Kamepalli Divya
  • R. Beaulah Jeyavathana

Abstract

In data mining, it has been known that major differences between multi-class distributions considered to be class imbalance problems hinder classification performance. Regrettably, current sample technologies also exposed the flaws, which include over-generation and oversampling problems or the unnecessary reduction of substantial data by under-sampling approaches. This study discusses three proposed sampling methods for imbalanced learning: the first is an entropy-based oversampling (EOS) method; the second is an entropy-based under-sampling (EUS) method; the third is an entropy-based hybrid sampling (EHS) method together of both oversampling and under-sampling methods. Such 3 methods are depends on a new class imbalance metric, referred to as entropy-based imbalance degree (EID), taking into account similarities in data content among classes rather than typical imbalance ratios. In particular, EOS provides new instances from difficult-to-learn instances in order to reduce the data set after analyzing the degree of influence of the data in each instance and only informative instances remain. The EUS eliminates simple-to-learn instances. While EHS can do that too simultaneously. Finally, to prepare a number of classifications, let include most of the created and existing instances.  Extensive studies on the synthetic and real-world data sets illustrate our methods effectiveness.

Downloads

Published

2020-02-01

Issue

Section

Articles