3

I am trying to use SMOTE to handle imbalanced class data in binary classification, and what I know is: if we use, for example

sm = SMOTE(ratio = 1.0, random_state=10)

Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266] 

After OverSampling, counts of label '1': 6266
After OverSampling, counts of label '0': 6266

for case where class 1 is minority, it will result in 50:50 number of class 0 and 1

and

sm = SMOTE(ratio = 0.5, random_state=10)

Before OverSampling, counts of label '1': [78]
Before OverSampling, counts of label '0': [6266] 

After OverSampling, counts of label '1': 3133
After OverSampling, counts of label '0': 6266

will result class 1 to be halved size of class 0.

My question:

how do we set the ratio to obtain more class 1 than class 0, for instance 75:25?

npm
  • 643
  • 5
  • 17

2 Answers2

2

Try using a dictionary.

smote_on_1 = 18798 
#(In your case 18798 is thrice of 6266)

smt = SMOTE(sampling_strategy={1: smote_on_1})
X_train, y_train = smt.fit_sample(X_train, y_train)
0

From the docs, it looks like ratio can be a float greater than 1 - i.e. for 75:25 ratio you can set ratio=3.
Try and see if this works.

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
  • I'm assuming you're using this implementation of SMOTE, or one that is similar enough; if not, please specify in your question which implementation are you using. If the implementation you're using is so common that I should've known, please enlighten me (I genuinely don't know, I'm not being sarcastic). – Itamar Mushkin Sep 08 '19 at 07:33
  • 1
    I tried. But got this message: When 'sampling_strategy' is a float, it should be in the range (0, 1], so the mx is 1.0 – npm Sep 08 '19 at 08:04
  • Another thing you can do is to send a `dict` with the desired number of samples. From the docs again: "When dict, the keys correspond to the targeted classes. The values correspond to the desired number of samples for each targeted class." – Itamar Mushkin Sep 08 '19 at 08:25