Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensed Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
1. Repeated Edited Nearest Neighbours
2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
Ensemble classifier using samplers internally
- EasyEnsemble
- BalanceCascade
- Balanced Random Forest
- Balanced Bagging

Resources:

205 questions

votes

2 answers

AttributeError: 'SMOTE' object has no attribute 'fit_sample'

Why I am getting the error AttributeError: 'SMOTE' object has no attribute 'fit_sample' I don't think this code should cause any error? from imblearn.over_sampling import SMOTE smt = SMOTE(random_state=0) X_train_SMOTE, y_train_SMOTE =…

asked Feb 25 '21 at 07:51

user12088653

votes

13 answers

ModuleNotFoundError: No module named 'imblearn'

I tried running the following code: from imblearn import under_sampling, over_sampling from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=12, ratio = 1.0) x_SMOTE, y_SMOTE = sm.fit_sample(X, y) which gives me the error…

python machine-learning pip imblearn

asked May 16 '18 at 17:36

bernando_vialli

votes

7 answers

AttributeError: 'SMOTE' object has no attribute '_validate_data'

I'm resampling my data (multiclass) by using SMOTE. sm = SMOTE(random_state=1) X_res, Y_res = sm.fit_resample(X_train, Y_train) However, I'm getting this attribute error. Can anyone help?

python scikit-learn imbalanced-data imblearn

asked Jun 17 '20 at 18:48

HP_17

votes

1 answer

How to apply oversampling when doing Leave-One-Group-Out cross validation?

I am working on an imbalanced data for classification and I tried to use Synthetic Minority Over-sampling Technique (SMOTE) previously to oversampling the training data. However, this time I think I also need to use a Leave One Group Out (LOGO)…

python machine-learning scikit-learn cross-validation imblearn

asked Jul 10 '19 at 06:27

npm

votes

5 answers

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

I have already pre-cleaned the data, and below shows the format of the top 4 rows: [IN] df.head() [OUT] Year cleaned 0 1909 acquaint hous receiv follow letter clerk crown... 1 1909 ask secretari state war…

scikit-learn knn tf-idf oversampling imblearn

asked Mar 20 '18 at 23:48

Dbercules

votes

4 answers

No module named 'sklearn.neighbors._base'

I have recently installed imblearn package in jupyter using !pip show imbalanced-learn But I am not able to import this package. from tensorflow.keras import backend from imblearn.over_sampling import SMOTE I get the following…

python scikit-learn imbalanced-data imblearn

asked Feb 10 '20 at 07:14

joel

1,156
3
15
42

votes

14 answers

Jupyter: No module named 'imblearn" after installation

I installed "imbalanced-learn" (version 0.3.1) on ANACONDA Navigator. When I ran an example from the imbalanced-learn website using Jupyter (Python 3): from imblearn.datasets import make_imbalance from imblearn.under_sampling import NearMiss from…

python-3.x anaconda imblearn

asked Dec 02 '17 at 10:25

TTZ

votes

1 answer

Cross Validating With Imblearn Pipeline And GridSearchCV

I'm trying to use the Pipeline class from imblearn and GridSearchCV to get the best parameters for classifying the imbalanced dataset. As per the answers mentioned here, I want to leave out resampling of the validation set and only resample the…

python-3.x scikit-learn pipeline imblearn

asked Nov 12 '19 at 08:49

Krishnang K Dalal

2,322
9
34
55

votes

8 answers

Problems importing imblearn python package on ipython notebook

I installed https://github.com/glemaitre/imbalanced-learn on windows powershell using pip install, conda and github. But when I'm on iPython notebook and I tried to import the package using: from unbalanced_dataset import UnderSampler, OverSampler,…

python python-2.7 powershell jupyter-notebook imblearn

asked Oct 12 '16 at 20:42

ugradmath

votes

1 answer

Does imblearn pipeline turn off sampling for testing?

Let us suppose the following code (from imblearn example on pipelines) ... # Instanciate a PCA object for the sake of easy visualisation pca = PCA(n_components=2) # Create the samplers enn = EditedNearestNeighbours() renn =…

python machine-learning imblearn

asked Aug 21 '20 at 10:14

Jacques Wainer

votes

1 answer

Class weights vs under/oversampling

In imbalanced classification (with scikit-learn) what would be the difference of balancing classes (i.e. set class_weight to balanced) to oversampling with SMOTE for example? What would be the expected effects of one vs the other?

python scikit-learn classification imblearn

asked Apr 12 '19 at 18:44

Mario L

votes

3 answers

How to perform SMOTE with cross validation in sklearn in python

I have a highly imbalanced dataset and would like to perform SMOTE to balance the dataset and perfrom cross validation to measure the accuracy. However, most of the existing tutorials make use of only single training and testing iteration to perfrom…

python machine-learning scikit-learn cross-validation imblearn

asked Apr 09 '19 at 10:50

EmJ

4,398
9
44
105

votes

4 answers

Jupyter Notebook: Importing SMOTE from imblearn - ImportError: cannot import name 'pairwise_distances_chunked'

I'm trying to use the SMOTE package in the imblearn library using: from imblearn.over_sampling import SMOTE getting the following error message: ImportError: cannot import name 'pairwise_distances_chunked'. Here is a screenshot of my import…

python jupyter imblearn

asked Oct 18 '18 at 19:26

Billy Hansen

votes

2 answers

How to fix: No samples will be generated with the provided ratio settings. (imblearn)

I have this code: from imblearn.over_sampling import ADASYN Y = df.target X = df.drop('target', axis=1) ad = ADASYN() X_adasyn, y_adasyn = ad.fit_sample(X, Y) getting this error: ValueError: No samples will be generated with the provided ratio…

python-3.x scikit-learn resampling imblearn

asked May 12 '19 at 14:18

omer karabey

votes

1 answer

Does oversampling happen before or after cross-validation using imblearn pipelines?

I have split my data into train/test before doing cross-validation on the training data to validate my hyperparameters. I have an unbalanced dataset and want to perform SMOTE oversampling on each iteration, so I have established a pipeline using…

python-3.x scikit-learn xgboost imblearn

asked May 06 '19 at 20:04

TomNash

3,147
2
21
57

2 3

…

13 14 Next