1

I'm working with computer vision project, where my images are combination of webp and jpeg. I'm using tensorflow '2.3.2'
You can think my directories like this :

IMAGES
 |-img1.jpeg
 |-img2.webp

For reading webp, I use tfio.image.decode_webp and when reading jpeg, I use tf.image.decode_jpeg(img, channels=3). Here's the code :

def load(file_path):
    img = tf.io.read_file(file_path)
    extension = tf.strings.split(file_path,sep=".")    
    if extension[-1] == "webp" :
        img = tfio.image.decode_webp(img)
    else :
        img = tf.image.decode_jpeg(img, channels=3)
    #img preprocess here
    return img

def create_dataset(df,batch_size):
    image = df["image_path"]
    # I'm working on MultiTaskLearning so I have multiple targets
    target1 = df["target1"].to_numpy()
    target2 = df["target2"].to_numpy()
    
    ds = tf.data.Dataset.from_tensor_slices((image,target1,target2))
    ds = ds.map(lambda image, target1,target2: (load(image),  {"target1":target1, "target2":target2}), num_parallel_calls=tf.data.experimental.AUTOTUNE)
    
    ds = ds.batch(batch_size)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

dataset = create_dataset(df,100)

The problem is, webp are converted to 4 channel(RGBA) tensor where decode jpeg is in 3 channel(RGB). This creates inconsistencies within my dataset since the model only except 3 channel images.

One solution I can think of is converting all my webp to jpeg through this. But is there any better solution for this? like converting the 4 channel into 3 channel in TensorFlow or reading webp as 3 channel in TensorFlow or anything else where I can just put the solution inside my python script?

Vinson Ciawandy
  • 996
  • 11
  • 26
  • If you want to train a single model with both `jpeg` and `webp` then you need to create the same input layer layout for both. There's no need to convert your images, you can simply convert the `img` tensor after `load`. There already is a nice answer explaining how to do the RGBA>RGB conversion for numpy arrays (hint: `img.numpy()`) on SO: https://stackoverflow.com/a/58748986/1622937 – j-i-l May 18 '21 at 11:00
  • Thanks for the suggestion. The solution you are proposing seems very nice. But when I try to apply it, it raise `AttributeError: 'Tensor' object has no attribute 'numpy'`. I think this have something to do with tensorflow eager execution aren't working properly. Even without numpy, printing the tensor.shape after image decoding step yield (None,None,None) – Vinson Ciawandy May 18 '21 at 11:17
  • I've updated the question with more code that I use – Vinson Ciawandy May 18 '21 at 11:21
  • You could use [`tfio.experimental.color.rgba_to_rgb`](https://www.tensorflow.org/io/api_docs/python/tfio/experimental/color/rgba_to_rgb). It should work in graph mode. You should note that this method just take the RGB part of the RGBA image. If your images have no transparency, it should be enough. – Lescurel May 18 '21 at 11:48
  • @Lescurel it should be fairly simple to adapt this to a proper conversion from rgba to rgb. I'll have a go... – j-i-l May 18 '21 at 12:11
  • @VinsonCiawandy any luck with the suggested approach? – j-i-l May 18 '21 at 21:08
  • @jojo thanks for asking, your solution yield some error. I have put comment on your answer. @Lesrucel I'm putting the `rgba_to_rgb` right after the if else in my load function. It raise `ValueError: Cannot infer num from shape (None, None, None)` at creation dataset step. The error raise specifically from `rgba = tf.unstack(input, axis=-1)` inside the source code – Vinson Ciawandy May 19 '21 at 01:27
  • They issue you are having is not related to the rgba to rbg conversion but to the argument you pass to `load`. How does the column `"image_path"` in your `df` look like? – j-i-l May 19 '21 at 09:57
  • `df["image_path"]` is series of string. each entry points to the image location. this is the example value of image_path : "../datasets/test_data/1/M-61Grenade.jpg" – Vinson Ciawandy May 20 '21 at 04:41

1 Answers1

0

If you want to train a single model with both *.jpeg and *.webp images then you should create the same input layer layout for both.

To do so, you basically need to convert either RGB to RGBA or (what I would do) RGBA to RGB. If you want to simply drop the alpha-channel you can use tensorflow's rgba_to_rgb (as @Lescurel pointed out in the comments).

But the RBGA to RGB conversion with alpha compositing is not very complex and you can do the operation directly on the tensor your get from calling load.

Here is a tensorflow adaptation of a numpy implementation proposed for this SO question:

def rgba2rgb(rgba, background=(255,255,255)):
    row, col, ch = tf.shape(rgba)
    if ch == 3:
        return rgba
    assert ch == 4, 'RGBA image has 4 channels.'
    r, g, b, a = tf.unstack(tf.cast(rgba, tf.float32), axis=-1)
    a =  tf.cast(a, tf.float32) / 255.0
    R, G, B = background
    r = r * a + (1.0 - a) * R
    g = g * a + (1.0 - a) * G
    b = b * a + (1.0 - a) * B
    rgb = tf.stack([r,g,b], axis=-1)
    return tf.cast(rgb, tf.uint8)

Using this avoids having to call .numpy() on the tensor and the potential issues related to that.

So basically rgba2rgb(load(image)) should then do the trick.

j-i-l
  • 10,281
  • 3
  • 53
  • 70
  • 1
    Thank you, this solution work when I read all the images in a single pass without the dataset class . But when I'm applying it to `create_dataset` function like this : `ds = ds.map(lambda image, target1,target2: (rgba2rgb(load(image)), {"target1":target1, "target2":target2}),num_parallel_calls=tf.data.experimental.AUTOTUNE)` . It raise `AssertionError: RGBA image has 4 channels.` when I call predict function from my model(no error at when creating the dataset). `print(ch)` before the assertion show `None` – Vinson Ciawandy May 19 '21 at 01:14
  • hmm, this issue is not related to the RBGA>RGB conversion. To confirm this, simply add a `print(img)` just before the return statement in your `load` function. My guess is that it will print something like `(None, None, None)`. I'm not on a system with tf installed so I'm just guessing here, but the issue is likely in: `ds = tf.data.Dataset.from_tensor_slices((image,target1,target2))`. – j-i-l May 19 '21 at 09:26
  • `print(img)` show `Tensor("resize/Squeeze:0", shape=(224, 224, None), dtype=float32)` Likely the ch is None since it can be 3 or 4. But the weird this is it still raise the same error even if I put the `rgba2rgb` inside the `load` function. – Vinson Ciawandy May 20 '21 at 04:37
  • I see. I've updated my answer (mind the decorator `@tf.function`) in order to convert the rgba2rbg into a callable tf graph and access the shape at runtime (`tf.shape(..`) – j-i-l May 20 '21 at 11:18
  • I use your new `rgba2rgb` and put @tf.function on top of it, it raise this error `OperatorNotAllowedInGraphError: iterating over 'tf.Tensor' is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.` – Vinson Ciawandy May 20 '21 at 11:42