Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions

votes

1 answer

How to split data based on a column value in sklearn

I have a data file with following columns 'customer', 'calibrat' - Calibration sample = 1; Validation sample = 0; 'churn', 'churndep', 'revenue', 'mou', Data file contains some 40000 rows out of which 20000 have value for calibrat as 1. I want to…

asked Apr 09 '20 at 06:56

Guest

votes

4 answers

Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me…

python pandas smote

asked Dec 15 '20 at 18:42

devdon

votes

3 answers

package to do SMOTE in R

I am trying to do SMOTE in R for imbalanced datasets. I tried installing "DMwR" package for this, but it seems this package has been removed from the cran repository. I am getting the error:" package ‘DMwR’ is not available (for R version 4.0.2)…

r imbalanced-data smote

asked Apr 14 '21 at 05:01

Triparna Poddar

votes

2 answers

Retain pandas dataframe structure after SMOTE, oversampling in python

Problem: While implementing SMOTE (a type of oversampling) , my dataframe is getting converted to numpy array). Test_train_split from sklearn.model_selection import train_test_split X_train, X_test, y_train_full, y_test_full = train_test_split(X,…

python pandas numpy smote

asked Feb 27 '20 at 11:43

noob

3,601
6
27
73

votes

1 answer

SMOTE function not working in make_pipeline

I wanna simultaneously apply cross-validation and over-sampling. I get the following error from this code: from sklearn.pipeline import Pipeline, make_pipeline imba_pipeline = make_pipeline(SMOTE(random_state=42), …

python scikit-learn cross-validation oversampling smote

asked Nov 12 '19 at 19:00

Vahid the Great

votes

1 answer

TypeError: Encoders require their input to be uniformly strings or numbers. Got ['int', 'str']

I already referred the posts here, here and here. Don't mark it as duplicate. I am working on a binary classification problem where my dataset has categorical and numerical columns. However, some of the categorical columns has a mix of numeric and…

python pandas machine-learning scikit-learn smote

asked Feb 20 '22 at 11:04

The Great

7,215
7
40
128

votes

1 answer

How can I use SMOTE in a Sklearn Pipeline for a NLP Classification problem?

I'm dealing with a multiclass classification problem, in which some classes are very imbalanced. My data looks like this: product_description class "This should be used to clean..." 1 "Beauty product, natural..." …

python scikit-learn nlp pipeline smote

asked Sep 08 '21 at 13:46

dekio

votes

3 answers

SMOTE - could not convert string to float

I think I'm missing something in the code below. from sklearn.model_selection import train_test_split from imblearn.over_sampling import SMOTE # Split into training and test sets # Testing Count Vectorizer X = df[['Spam']] y =…

python pandas sampling resampling smote

asked Dec 13 '20 at 21:25

Math

votes

1 answer

TypeError: init() got an unexpected keyword argument 'ratio' when using SMOTE

I am using SMOTE to oversample as my dataset is imbalanced. I am getting an unexpected argument error. But in the documentation, the ratio argument is defined for SMOTE. Can someone help me understand where I am going wrong? Code snippet from…

oversampling imblearn smote

asked Jun 06 '20 at 00:11

anushiya-thevapalan

votes

2 answers

SMOTE with multiple bert inputs

I'm building a multiclass text classification model using Keras and Bert (HuggingFace), but I have a very imbalanced dataset. I've used SMOTE from Sklearn in order to generate additional samples for the underbalanced classes (I have 45 in total),…

python keras scikit-learn huggingface-transformers smote

asked May 13 '20 at 14:15

ML_Engine

1,065
2
13
31

votes

1 answer

Why does SMOTE not work with more than 15 features / What method does work with more than 15 features?

I'm currently implementing machine learning using SMOTE from imblearn.over_sampling, and as I'm synthesizing data for it, I see a very noticeable cutoff for when the SMOTE method breaks. When I synthesize data using the following code and run it…

python machine-learning scikit-learn smote imblearn

asked Jun 13 '22 at 18:56

Brandon Bonifacio

votes

7 answers

Cannot import name 'available_if' from 'sklearn.utils.metaestimators'

While importing "from imblearn.over_sampling import SMOTE", getting import error. Please check and help. I tried upgrading sklearn, but the upgrade was undone with 'OSError'. Firsty installed imbalance-learn through pip. !pip install -U…

python jupyter-notebook imbalanced-data imblearn smote

asked Oct 17 '21 at 07:05

Piyush

votes

2 answers

Oversampling a sparse dataset in Python

I have a dataset that has a multi-labeled data. There is a total of 20 labels (from 0 to 20) which has an imbalance distribution among them. Here is an overview of the data: |id |label|value | |-----|-----|------------| |95534|0 …

python pandas dataframe oversampling smote

asked Sep 11 '20 at 17:27

LoneWolf

votes

0 answers

Python - How to differentiate SMOTE resampling from original data

I over sampled my data using SMOTE like so: >>> from imblearn.over_sampling import SMOTE >>> X_resampled, y_resampled = SMOTE().fit_resample(X, y) So now X_resampled, y_resampled are larger than the original data set. How can I tell apart the…

python machine-learning oversampling smote

asked Jun 07 '20 at 13:58

Shlomi Schwartz

8,693
29
109
186

votes

2 answers

How do we set ratio in SMOTE to have more positive sample than negative sample?

I am trying to use SMOTE to handle imbalanced class data in binary classification, and what I know is: if we use, for example sm = SMOTE(ratio = 1.0, random_state=10) Before OverSampling, counts of label '1': [78] Before OverSampling, counts of…

python pandas scikit-learn preprocessor smote

asked Sep 08 '19 at 03:42

npm

2 3

…

12 13 Next