Problem definition
Imbalanced data occurs in machine-learning when:
- "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
- "[T]he cases that are more relevant for the user are poorly represented in the training set."
Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.
Software
- imbalanced-learn: imblearn
Related Tags and Techniques
- SMOTE: smote (Synthetic Minority Oversampling Technique)
- Resampling: resampling
- Oversampling: oversampling
- Downsampling: downsampling