Somewhat Resolved: Problem with Upsampling for Image Classification

Question

I have an image dataset of 12 classes. I'm comparing some different techniques of upsampling for image classication and wanted to upsample the classes that are massively imbalanced using SMOTE.

I defined by training set as seen below:

img_height = 256
img_width = 256
batch_size = 32
train_ds = tf.keras.utils.image_dataset_from_directory(
  image_directory,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Output : "Found 11730 files belonging to 12 classes. Using 9384 files for training."

The problem is, from my understanding, is that in order to upsample with SMOTE, I have to actually have a defined x_train & y_train. I attempted to do so with

for images, labels in train_ds:  # only take first element of dataset
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()
    numpy_images.shape

The shape returns (8,256, 256, 3). This "8" feels out of place. I imagined that the shape would be the length of my training set, but maybe I'm misunderstanding.

I re-shaped the numpy array to make it 2D to be passed through SMOTE

x_train = numpy_images.reshape(8, 256 * 256 * 3)
x_train.shape

The shape returns (8, 196608)

from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_smote, y_smote = SMOTE(k_neighbors=2).fit_resample(x_train, numpy_labels)

Finally, the above returns the error that n_samples = 1, n_neighbors = 2.

My questions are

How do I resolve this issue?
Why would the shape be (8, 256, 256, 3) instead of (9384, 256, 256, 3)?

Any help is appreciated!

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

^This forum was extremely helpful but I still can't explain the shape of my numpy array.

My solution: I ended up just duplicating a specified number of images for each of the underrepresented categories. I did the preceding in File Explorer.

The first dimension should be the batch size (32 in your code). Are you sure you did not set the batch size to 8? — Louis Lac, Dec 07 '22 at 16:18
for `SMOTE(k_neighbors=2)`, the minimum samples per class for each batch should be 2, otherwise it will throw error. So increase the batch size to a large number. Also you have set `batch_size=batch_size)` in `image_dataset_from_directory`, that's why its not 9384 — Vijay Mariappan, Dec 09 '22 at 21:28

Somewhat Resolved: Problem with Upsampling for Image Classification

0 Answers0