Neural network classifies everything into one class, recall=1 on imbalanced dataset

Question

I'm trying to do a binary classification with a Deep Neural Network (esp. VGG16) in Keras. Unfortunately I have a very imbalanced data-set (15.000/1.800 images) but just can't find a way to circumvent that..

Results that I'm seeing (on training and validation data)

Recall = 1
Precision = 0.1208 (which is exactly the ratio between class 0 and class 1 samples)
AUC = 0.88 (after ~30 epochs with SGD, which seems to be 1 - Precision)

What I've done

Switching from loss/accuracy metrics to AUC with this little helper
Utilizing class_weight like described here which doesn't seem to help
Trying different optimizers (SGD, Adam, RMSProp)
Adding BatchNormalization layers to my (untrained) VGG16 and set use_bias to False on Convolutional Layers. See my whole network as a gist here.
Doing Augmentation to enlarge dataset with Keras inbuilt ImageDataGenerator.

What I think could help further (but did not try yet)

Doing more data augmentation for one class than the other. Unfortunately I'm using one ImageDataGenerator for my whole training data and I don't know how to augment one class more than the other.
Maybe a custom loss-function which penalises false decisions more? How would I implement that? Currently I'm just using binary_crossentropy.
Theoretically I could adjust the class-membership-threshold for prediction but that doesn't help with training and would not improve the result, right?
Maybe decrease batch-size like suggested here. But I don't really see why that should help. Currently I'm determining the batch-size programmatically to show all the training and validation data to the network in one epoch: steps_per_epoch = int(len(train_gen.filenames) / args.batch_size) validation_steps = int(len(val_gen.filenames) / args.batch_size)

What do you think should I tackle first or do you have a better idea? I'm also glad for every help with implementation details.

Thank you so much in advance!

ralf htp · Answer 1 · 2018-08-31T07:58:10.850

Maybe try to prepare class-balanced batches ( includes doublings of class 1 ) like described in https://community.rstudio.com/t/ensure-balanced-mini-batches-while-training/7505 ( R Studio ). Also read Neural Network - Working with a imbalanced dataset and balancing an imbalanced dataset with keras image generator

Another possibility is to perform feature extraction in the pre-processing meaning letting run image processing algorithms over the images to highlight characteristic features

Neural network classifies everything into one class, recall=1 on imbalanced dataset

1 Answers1