1

I have a trained EfficientNetB2 neural network that I'm using for image classification. When I'm loading the images with PIL like this:

image = Image.open(item)
image = image.convert('RGB').resize((120, 120))
image = np.array(image)

if image.ndim == 3:
    image = np.expand_dims(image, axis = 0)


predictions.append(model.predict(image))

I get accuracy at around 90%. This is, however, extremely slow so I tried using tf.data to load my dataset. This looks something like this:

ds = tf.data.Dataset.list_files(str(test_dir / '*' / '*'))
ds = (ds
      .map(load_data, num_parallel_calls=tf.data.experimental.AUTOTUNE)
      .cache()
      .batch(32)
        
      .prefetch(tf.data.experimental.AUTOTUNE)
    )

And this is the load_data function:

def load_image_and_label(file_path):
   label = tf.strings.split(file_path, os.sep)[-2]
   image = tf.io.read_file(file_path)
   image = tf.io.decode_jpeg(image, channels=3, dct_method='INTEGER_ACCURATE')
   image = tf.image.resize(image, target_size)
   

   return image, label

This is, as expected, much much faster but the accuracy drops to around ~70%. I've tried moving stuff around but I just can't figure out why this happens. If anyone has any suggestions it would be much appreciated.

P.S. I'm aware that there is an almost identical question already asked on stack overflow but the answer to that question doesn't change anything for my situation, this is why I'm posting this as a separate question. Thank you.

Edit: I tried not using tf.Dataset but still using my load_image_and_label function, the results were again ~90% accuracy, meaning that there is a problem somewhere with tf.Dataset pipeline, anyone got any experience with this kind of a problem?

WholesomeGhost
  • 1,101
  • 2
  • 17
  • 31

0 Answers0