2

I'm trying to do a binary classification with a Deep Neural Network (esp. VGG16) in Keras. Unfortunately I have a very imbalanced data-set (15.000/1.800 images) but just can't find a way to circumvent that..

Results that I'm seeing (on training and validation data)

  • Recall = 1
  • Precision = 0.1208 (which is exactly the ratio between class 0 and class 1 samples)
  • AUC = 0.88 (after ~30 epochs with SGD, which seems to be 1 - Precision)

What I've done

  • Switching from loss/accuracy metrics to AUC with this little helper
  • Utilizing class_weight like described here which doesn't seem to help
  • Trying different optimizers (SGD, Adam, RMSProp)
  • Adding BatchNormalization layers to my (untrained) VGG16 and set use_bias to False on Convolutional Layers. See my whole network as a gist here.
  • Doing Augmentation to enlarge dataset with Keras inbuilt ImageDataGenerator.

What I think could help further (but did not try yet)

  1. Doing more data augmentation for one class than the other. Unfortunately I'm using one ImageDataGenerator for my whole training data and I don't know how to augment one class more than the other.
  2. Maybe a custom loss-function which penalises false decisions more? How would I implement that? Currently I'm just using binary_crossentropy.
  3. Theoretically I could adjust the class-membership-threshold for prediction but that doesn't help with training and would not improve the result, right?
  4. Maybe decrease batch-size like suggested here. But I don't really see why that should help. Currently I'm determining the batch-size programmatically to show all the training and validation data to the network in one epoch: steps_per_epoch = int(len(train_gen.filenames) / args.batch_size) validation_steps = int(len(val_gen.filenames) / args.batch_size)

What do you think should I tackle first or do you have a better idea? I'm also glad for every help with implementation details.

Thank you so much in advance!

petezurich
  • 9,280
  • 9
  • 43
  • 57
Dennis Zoma
  • 2,621
  • 2
  • 17
  • 27

1 Answers1

0

Maybe try to prepare class-balanced batches ( includes doublings of class 1 ) like described in https://community.rstudio.com/t/ensure-balanced-mini-batches-while-training/7505 ( R Studio ). Also read Neural Network - Working with a imbalanced dataset and balancing an imbalanced dataset with keras image generator

Another possibility is to perform feature extraction in the pre-processing meaning letting run image processing algorithms over the images to highlight characteristic features

ralf htp
  • 9,149
  • 4
  • 22
  • 34