4

I am working on a multiclass classification problem with an unbalanced dataset of images(different class). I tried imblearn library, but it is not working on the image dataset.

I have a dataset of images belonging to 3 class namely A,B,C. A has 1000 data, B has 300 and C has 100. I want to oversample class B and C, so that I can avoid data imbalance. Please let me know how to oversample the image dataset using python.

sophros
  • 14,672
  • 11
  • 46
  • 75
ReInvent_IO
  • 477
  • 3
  • 13
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [on topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. In particular, be detailed about what you've attempted ("I tried imblearn library" is far too general) and what's wrong ("it is not working" is not a problem specification). We can't fix a problem when we don't know what you have to accomplish, what you did, and what went wrong. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. – Prune Jan 31 '18 at 00:00
  • Thanks @Prune for letting me know the guidelines. Could you please let me know how to do oversampling for data with images. I have a dataset of images belonging to 3 class namely A,B,C. A has 1000 data, B has 300 and C has 100. I want to oversample class B and C. So that I could avoid data imbalance. Please let me know. Thanks once again for trying to help me. – ReInvent_IO Jan 31 '18 at 02:46

1 Answers1

1

Actually, it seems imblearn.over_sampling resampling just 2d dims inputs. So one way to oversampling your image dataset by this library is to use reshaping alongside with it, you can:

  • reshape your images
  • oversample them
  • again reshape the new dataset to the first dims

consider you have an image dataset of size (5000, 28, 28, 3) and dtype of nd.array, following the above instructions, you can use the solution below:

# X : current_dataset
# y : labels

from imblearn.over_sampling import RandomOverSampler
reshaped_X = X.reshape(X.shape[0],-1)

#oversampling
oversample = RandomOverSampler()
oversampled_X, oversampled_y  = oversample.fit_resample(reshaped_X , y)

# reshaping X back to the first dims
new_X = oversampled_X.reshape(-1,28,28,3)

hope that was helpful!