How should image preprocessing and data augmentation be for semantic segmentation?

Question

I have an imbalanced and small dataset which contains 4116 224x224x3 (RGB) aerial images. It's very likely that I will encounter the overfitting problem since the dataset is not big enough. Image preprocessing and data augmentation help to tackle this problem as explained below.

"Overfitting is caused by having too few samples to learn from, rendering you unable to train a model that can generalize to new data. Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images."

Deep Learning with Python by François Chollet, page 138-139, 5.2.5 Using data augmentation.

I've read Medium - Image Data Preprocessing for Neural Networks and examined Stanford's CS230 - Data Preprocessing and CS231 - Data Preprocessing courses. It is highlighted once more in SO question and I understand that there is no "one fits all" solution. Here is what forced me to ask this question:

"No translation augmentation was used since we want to achieve high spatial resolution."

Reference: Researchgate - Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks

I know that I will use Keras - ImageDataGenerator Class, but don't know which techniques and what parameters to use for the semantic segmentation on small objects task. Could someone enlighten me? Thanks in advance. :)

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=20,      # is a value in degrees (0–180)
    width_shift_range=0.2,  # is a range within which to randomly translate pictures horizontally.
    height_shift_range=0.2, # is a range within which to randomly translate pictures vertically.
    shear_range=0.2,        # is for randomly applying shearing transformations.
    zoom_range=0.2,         # is for randomly zooming inside pictures.
    horizontal_flip=True,   # is for randomly flipping half the images horizontally
    fill_mode='nearest',    # is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift
    featurewise_center=True,
    featurewise_std_normalization=True)

datagen.fit(X_train)

The question is very broad, still I will try to put my perspective on this. As you must have read, augmentation is done to fake data, basically generating data out of the available data to produce more data within the distribution. It thoroughly depends on what is variance you are expecting your model to encounter in future after the training. For example, if you think you might encounter images may be flipped, rotated add it as a part of your augmentation. For preprocessing, I have so far used grayscale images, normalized with mean and standard deviation. — venkata krishnan, Jul 22 '19 at 01:17

manza · Accepted Answer · 2019-07-24T10:30:18.250

The augmentation and preprocessing phases are always depending on the problem that you have. You have to think of all the possible augmentation which can enlarge your dataset. But the most important thing is, that you should not perform extreme augmentations, which makes new training samples in the way which can not happen in real examples. If you do not expect that the real examples will be horizontally flipped do not perform horizontal flip, since this will give your model false information. Think of all the possible changes that can happen in your input images and try to artificially produce new images from your existing one. You can use a lot of built-in functions from Keras. But you should be aware of each that it will not make new examples which are not likely to be present on the input of your model.

As you said, there is no "one fits all" solution, because everything is dependent on the data. Analyse the data and build everything with respect to it.

About the small objects - one direction which you should check are the loss functions which emphasise the impact of target volumes in comparison to the background. Look at the Dice Loss or Generalised Dice Loss.

I've analyzed all the data augmentation techniques and find plausible answers for my dataset. But I am still confused about the weighted-loss-functions. I don't understand how dice loss can be used for unbalanced multiclass segmentation problem. How can I tell that I care classA much more than others? One is suggested that one hot encoded vector can be changed (instead of writing 1 for true class, write the class weight i.e. 999). Please, look at below link to understand better what I mean. https://github.com/qubvel/segmentation_models/issues/137#issuecomment-515774767 — saki, Aug 04 '19 at 12:04
When I suggested the usage of loss function i was thinking on small object in the sense of volume size. If you have unbalanced classes this is a different problem. For this problem it is useful to use balanced sampling. Try to look at different sampling techniques. — manza, Aug 05 '19 at 13:13

How should image preprocessing and data augmentation be for semantic segmentation?

1 Answers1