DOI

Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications. Specifically, they generate artificial data that can be used by any algorithm, not constraining the options of the user. In this paper, we present a new oversampling method, Self-Organizing Map-based Oversampling (SOMO), which through the application of a Self Organizing Map produces a two dimensional representation of the input space, allowing for an effective generation of artificial data points. SOMO comprises three major stages: Initially a Self-Organizing Map produces a two-dimensional representation of the original, usually high-dimensional, space. Next it generates within-cluster synthetic samples and finally it generates between cluster synthetic samples. Additionally we present empirical results that show the improvement in the performance of algorithms, when artificial data generated by SOMO are used, and also show that our method outperforms various oversampling methods.
Original languageEnglish
Pages (from-to)40-52
Number of pages13
JournalExpert Systems With Applications
Volume82
DOIs
StatePublished - 1 Oct 2017

ID: 2576401