Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

  1. Random majority under-sampling with replacement
  2. Extraction of majority-minority Tomek links
  3. Under-sampling with Cluster Centroids
  4. NearMiss-(1 & 2 & 3)
  5. Condensed Nearest Neighbour
  6. One-Sided Selection
  7. Neighboorhood Cleaning Rule
  8. Edited Nearest Neighbours
  9. Instance Hardness Threshold
    1. Repeated Edited Nearest Neighbours
    2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

  1. Over-sampling followed by under-sampling

    • SMOTE + Tomek links
    • SMOTE + ENN
  2. Ensemble classifier using samplers internally

    • EasyEnsemble
    • BalanceCascade
    • Balanced Random Forest
    • Balanced Bagging

Resources:

205 questions
32
votes
2 answers

AttributeError: 'SMOTE' object has no attribute 'fit_sample'

Why I am getting the error AttributeError: 'SMOTE' object has no attribute 'fit_sample' I don't think this code should cause any error? from imblearn.over_sampling import SMOTE smt = SMOTE(random_state=0) X_train_SMOTE, y_train_SMOTE =…
user12088653
21
votes
13 answers

ModuleNotFoundError: No module named 'imblearn'

I tried running the following code: from imblearn import under_sampling, over_sampling from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=12, ratio = 1.0) x_SMOTE, y_SMOTE = sm.fit_sample(X, y) which gives me the error…
bernando_vialli
  • 947
  • 4
  • 12
  • 27
18
votes
7 answers

AttributeError: 'SMOTE' object has no attribute '_validate_data'

I'm resampling my data (multiclass) by using SMOTE. sm = SMOTE(random_state=1) X_res, Y_res = sm.fit_resample(X_train, Y_train) However, I'm getting this attribute error. Can anyone help?
HP_17
  • 203
  • 1
  • 4
  • 10
16
votes
1 answer

How to apply oversampling when doing Leave-One-Group-Out cross validation?

I am working on an imbalanced data for classification and I tried to use Synthetic Minority Over-sampling Technique (SMOTE) previously to oversampling the training data. However, this time I think I also need to use a Leave One Group Out (LOGO)…
16
votes
5 answers

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

I have already pre-cleaned the data, and below shows the format of the top 4 rows: [IN] df.head() [OUT] Year cleaned 0 1909 acquaint hous receiv follow letter clerk crown... 1 1909 ask secretari state war…
Dbercules
  • 629
  • 1
  • 9
  • 26
15
votes
4 answers

No module named 'sklearn.neighbors._base'

I have recently installed imblearn package in jupyter using !pip show imbalanced-learn But I am not able to import this package. from tensorflow.keras import backend from imblearn.over_sampling import SMOTE I get the following…
joel
  • 1,156
  • 3
  • 15
  • 42
15
votes
14 answers

Jupyter: No module named 'imblearn" after installation

I installed "imbalanced-learn" (version 0.3.1) on ANACONDA Navigator. When I ran an example from the imbalanced-learn website using Jupyter (Python 3): from imblearn.datasets import make_imbalance from imblearn.under_sampling import NearMiss from…
TTZ
  • 823
  • 2
  • 9
  • 19
9
votes
1 answer

Cross Validating With Imblearn Pipeline And GridSearchCV

I'm trying to use the Pipeline class from imblearn and GridSearchCV to get the best parameters for classifying the imbalanced dataset. As per the answers mentioned here, I want to leave out resampling of the validation set and only resample the…
Krishnang K Dalal
  • 2,322
  • 9
  • 34
  • 55
9
votes
8 answers

Problems importing imblearn python package on ipython notebook

I installed https://github.com/glemaitre/imbalanced-learn on windows powershell using pip install, conda and github. But when I'm on iPython notebook and I tried to import the package using: from unbalanced_dataset import UnderSampler, OverSampler,…
ugradmath
  • 107
  • 1
  • 1
  • 4
8
votes
1 answer

Does imblearn pipeline turn off sampling for testing?

Let us suppose the following code (from imblearn example on pipelines) ... # Instanciate a PCA object for the sake of easy visualisation pca = PCA(n_components=2) # Create the samplers enn = EditedNearestNeighbours() renn =…
Jacques Wainer
  • 527
  • 1
  • 5
  • 14
8
votes
1 answer

Class weights vs under/oversampling

In imbalanced classification (with scikit-learn) what would be the difference of balancing classes (i.e. set class_weight to balanced) to oversampling with SMOTE for example? What would be the expected effects of one vs the other?
Mario L
  • 507
  • 1
  • 6
  • 15
8
votes
3 answers

How to perform SMOTE with cross validation in sklearn in python

I have a highly imbalanced dataset and would like to perform SMOTE to balance the dataset and perfrom cross validation to measure the accuracy. However, most of the existing tutorials make use of only single training and testing iteration to perfrom…
EmJ
  • 4,398
  • 9
  • 44
  • 105
7
votes
4 answers

Jupyter Notebook: Importing SMOTE from imblearn - ImportError: cannot import name 'pairwise_distances_chunked'

I'm trying to use the SMOTE package in the imblearn library using: from imblearn.over_sampling import SMOTE getting the following error message: ImportError: cannot import name 'pairwise_distances_chunked'. Here is a screenshot of my import…
Billy Hansen
  • 83
  • 1
  • 1
  • 6
6
votes
2 answers

How to fix: No samples will be generated with the provided ratio settings. (imblearn)

I have this code: from imblearn.over_sampling import ADASYN Y = df.target X = df.drop('target', axis=1) ad = ADASYN() X_adasyn, y_adasyn = ad.fit_sample(X, Y) getting this error: ValueError: No samples will be generated with the provided ratio…
omer karabey
  • 63
  • 1
  • 4
6
votes
1 answer

Does oversampling happen before or after cross-validation using imblearn pipelines?

I have split my data into train/test before doing cross-validation on the training data to validate my hyperparameters. I have an unbalanced dataset and want to perform SMOTE oversampling on each iteration, so I have established a pipeline using…
TomNash
  • 3,147
  • 2
  • 21
  • 57
1
2 3
13 14