I'm learning segmentation and data augmentation based in this TF 2.0 tutorial that uses Oxford-IIIT Pets.
For pre-processing/data augmentation they provide a set of functions into a specific pipeline:
# Import dataset
dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
def normalize(input_image, input_mask):
input_image = tf.cast(input_image, tf.float32) / 255.0
input_mask -= 1
return input_image, input_mask
@tf.function
def load_image_train(datapoint):
input_image = tf.image.resize(datapoint['image'], (128, 128))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
This code brought me several doubts given the tf syntax. To prevent me from just doing a ctrl C ctrl V and actually understanding how tensorflow works, I would like to ask some questions:
1) In normalize
function, the line tf.cast(input_image, tf.float32) / 255.0
can be changed by tf.image.convert_image_dtype(input_image, tf.float32)
?
2) In normalize
function it's possible to change my segmentation_mask values in tf.tensor
format without changing to a numpy
? What I desire to do is to only work with two possible masks (0 and 1) and not with (0, 1 and 2). Using numpy I made something like this:
segmentation_mask_numpy = segmentation_mask.numpy()
segmentation_mask_numpy[(segmentation_mask_numpy == 2) | (segmentation_mask_numpy == 3)] = 0
It's possible to do this without a numpy transformation?
3) In load_image_train
function they say that this function is doing data augmentation, but how? In my perspective they are changing the original image with a flip given a random number and not providing another image to the dataset based in the original image. So, the function goal is to change a image and not add to my dataset an aug_image keeping the original? If I'm correct how can I change this function to give an aug_image and keep my original image in the dataset?
4) In others questions such as How to apply data augmentation in TensorFlow 2.0 after tfds.load() and TensorFlow 2.0 Keras: How to write image summaries for TensorBoard they used a lot of .map()
sequential calls or .map().map().cache().batch().repeat()
. My question is: there is this necessity? Exist a more simple way to do this? I tried to read tf documentation, but without success.
5) You recommed to work with ImageDataGenerator
from keras as presented here or this tf approach is better?