8

Update

This is now officially supported by keras-cv.


To create a class label in CutMix or MixUp type augmentation, we can use beta such as np.random.beta or scipy.stats.beta and do as follows for two labels:

label = label_one*beta + (1-beta)*label_two

But what if we've more than two images? In YoLo4, they've tried an interesting augmentation called Mosaic Augmentation for object detection problems. Unlike CutMix or MixUp, this augmentation creates augmented samples with 4 images. In object detection cases, we can compute the shift of each instance co-ords and thus possible to get the proper ground truth, here. But for only image classification cases, how can we do that?

Here is a starter.

import tensorflow as tf
import matplotlib.pyplot as plt 
import random

(train_images, train_labels), (test_images, test_labels) = \
tf.keras.datasets.cifar10.load_data()
train_images = train_images[:10,:,:]
train_labels = train_labels[:10]
train_images.shape, train_labels.shape

((10, 32, 32, 3), (10, 1))

Here is a function we've written for this augmentation; ( too ugly with an `inner-outer loop! Please suggest if we can do it efficiently.)

def mosaicmix(image, label, DIM, minfrac=0.25, maxfrac=0.75):
    '''image, label: batches of samples 
    '''
    xc, yc  = np.random.randint(DIM * minfrac, DIM * maxfrac, (2,))
    indices = np.random.permutation(int(image.shape[0]))
    mosaic_image = np.zeros((DIM, DIM, 3), dtype=np.float32)
    final_imgs, final_lbs = [], []

    # Iterate over the full indices 
    for j in range(len(indices)): 
        # Take 4 sample for to create a mosaic sample randomly 
        rand4indices = [j] + random.sample(list(indices), 3) 
        
        # Make mosaic with 4 samples 
        for i in range(len(rand4indices)):
            if i == 0:    # top left
                x1a, y1a, x2a, y2a =  0,  0, xc, yc
                x1b, y1b, x2b, y2b = DIM - xc, DIM - yc, DIM, DIM # from bottom right        
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, 0, DIM , yc
                x1b, y1b, x2b, y2b = 0, DIM - yc, DIM - xc, DIM # from bottom left
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = 0, yc, xc, DIM
                x1b, y1b, x2b, y2b = DIM - xc, 0, DIM, DIM-yc   # from top right
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc,  DIM, DIM
                x1b, y1b, x2b, y2b = 0, 0, DIM-xc, DIM-yc    # from top left
                
            # Copy-Paste
            mosaic_image[y1a:y2a, x1a:x2a] = image[i,][y1b:y2b, x1b:x2b]

        # Append the Mosiac samples
        final_imgs.append(mosaic_image)
        
    return final_imgs, label

The augmented samples, currently with the wrong labels.

data, label = mosaicmix(train_images, train_labels, 32)
plt.imshow(data[5]/255)

enter image description here


However, here are some more examples to motivate you. Data is from the Cassava Leaf competition.


(source: googleapis.com)


(source: googleapis.com)

Innat
  • 16,113
  • 6
  • 53
  • 101

2 Answers2

4

We already know that, in CutMix, λ is a float number from the beta distribution Beta(α,α). We have seen, when α=1, it performs best. Now, If we grant α==1 always, we can say that λ is sampled from the uniform distribution..

Simply we can say λ is just a floating-point number which value will be 0 to 1.

So, only for 2 images, if we use λ for the 1st image then we can calculate the remaining unknown portion simply by 1-λ.

But for 3 images, if we use λ for the 1st image, we cannot calculate other 2 unknowns from that single λ. If we really want to do so, we need 2 random numbers for 3 images. In the same way, we can say that for the n number of images, we need the n-1 number random variable. And in all cases, the summation should be 1. (for example, λ + (1-λ) == 1). If the sum is not 1, the label will be wrong!

For this purpose Dirichlet distribution might be helpful because it helps to generate quantities that sum to 1. A Dirichlet-distributed random variable can be seen as a multivariate generalization of a Beta distribution.

>>> np.random.dirichlet((1, 1), 1)  # for 2 images. Equivalent to λ and (1-λ)
array([[0.92870347, 0.07129653]])  
>>> np.random.dirichlet((1, 1, 1), 1)  # for 3 images.
array([[0.38712673, 0.46132787, 0.1515454 ]])
>>> np.random.dirichlet((1, 1, 1, 1), 1)  # for 4 images.
array([[0.59482542, 0.0185333 , 0.33322484, 0.05341645]])

In CutMix, the size of the cropped part of an image has a relation with λ which weighting the corresponding labels.

enter image description here

enter image description here

So, for multiple λ, you also need to calculate them accordingly.

# let's say for 4 images
# I am not sure the proper way. 

image_list = [4 images]
label_list = [4 label]
new_img = np.zeros((w, h))

beta_list = np.random.dirichlet((1, 1, 1, 1), 1)[0]
for idx, beta in enumerate(beta_list):
    x0, y0, w, h = get_cropping_params(beta, full_img)  # something like this
    new_img[x0, y0, w, h] = image_list[idx][x0, y0, w, h]
    label_list[idx] = label_list[idx] * beta
Innat
  • 16,113
  • 6
  • 53
  • 101
Uzzal Podder
  • 2,925
  • 23
  • 26
1

Another way to look at this problem is by considering the lines of separation for both the width and height dimensions. When building the mosaic image, the goal is to combine 4 images into a single image. We can achieve this by randomly sampling midpoints (denoting the points of separation) in each dimension. This removes the rather complicated requirement of sampling 4 numbers summing up to 1. Instead, the goal now is to sample 2 independent values from a uniform distribution - a much simpler and more intuitive alternative.

So essentially, we sample two values:

w = np.random.uniform(0, 1)
h = np.random.uniform(0, 1)

To generate realistic mosaics where each image has a noticeable contribution, we can sample values from [0.25 0.75], rather than from [0, 1]

These two values are sufficient to parameterize the mosaic problem. Each image in the mosaic occupies areas spanned by the following coordinates: Consider that the mosaic image has dimensions W x H and the midpoints of each dimension are represented by w and h respectively.

 - top left     - (0, 0) to (w, h)
 - top right    - (w, 0) to (W, h)
 - bottom left  - (0, h) to (w, H)
 - bottom right - (w, h) to (W, H)

The sampled midpoints also help in calculating the class labels. Let's suppose we decide to use the area each image occupies within the mosaic as its corresponding contribution to the overall class label. For e.g Consider 4 images belonging to 4 classes {0, 1, 2, 3}. Now assume that the 0 image occupies the top left, 1 the top right, 2 the bottom left, and 3 the bottom right. We can build the class label L as follows:

Innat
  • 16,113
  • 6
  • 53
  • 101