F-Measure Curves: A Tool to Visualize Classifier Performance Under Imbalance

TitleF-Measure Curves: A Tool to Visualize Classifier Performance Under Imbalance
Publication TypeJournal Article
Year of Publication2020
AuthorsSoleymani, R, Granger, E, Fumera, G
JournalPattern Recognition
ISSN Number0031-3203
KeywordsClass imbalance, F-measure, Pattern Classification, Performance Metrics, Video Face Recognition, Visualization Tools

Learning from imbalanced data is a challenging problem in many real-world machine learning applications due in part to the bias of performance in most classification systems. This bias may exist due to three reasons: (1) classification systems are often optimized and compared using performance measurements that are unsuitable for imbalance problems; (2) most learning algorithms are designed and tested on a fixed imbalance level of data, which may differ from operational scenarios; (3) the preference of correct classification of classes is different from one application to another. This paper investigates specialized performance evaluation metrics and tools for imbalance problem, including scalar metrics that assume a given operating condition (skew level and relative preference of classes), and global evaluation curves or metrics that consider a range of operating conditions. We focus on the case in which the scalar metric F-measure is preferred over other scalar metrics, and propose a new global evaluation space for the F-measure that is analogous to the cost curves for expected cost. In this space, a classifier is represented as a curve that shows its performance over all of its decision thresholds and a range of possible imbalance levels for the desired preference of true positive rate to precision. Curves obtained in the F- measure space are compared to those of existing spaces (ROC, precision-recall and cost) and analogously to cost curves. The proposed F-measure space allows to visualize and compare classifiers’ performance under different operating conditions more easily than in ROC and precision-recall spaces. This space allows us to set the optimal decision threshold of a soft classifier and to select the best classifier among a group. This space also allows to empirically improve the performance obtained with ensemble learning methods specialized for class imbalance, by selecting and combining the base classifiers for ensembles using a modified version of the iterative Boolean combination algorithm that is optimized using the F-measure instead of AUC. Experiments on a real-world dataset for video face recognition show the advantages of evaluating and comparing different classifiers in the F-measure space versus ROC, precision-recall, and cost spaces. In addition, it is shown that the performance evaluated using the the F-measure of Bagging ensemble method can improve considerably by using the modified iterative Boolean combination algorithm.


Citation Key1453
Refereed DesignationRefereed
F_measure_PR.pdf3.15 MB