5

I'm using the VGG19 model from keras application. I was expecting the image to be scaled to [-1, 1] but instead, it seems that preprocess_input is doing something else.

To preprocess the input, I use the following 2 lines to first load the image and then scale it:

from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input

img = image.load_img("./img.jpg", target_size=(256, 256))
img = preprocess_input(np.array(img))

print(img)
>>> array([[[151.061  , 138.22101, 131.32   ],
    ... ]]]

The output seems to be in the [0,255] interval, however, original 255s were mapped to values around 151 (Likely centering). What is the input that VGG actually requires? I thought it should be in [-1,1] by looking at the source code (for mode='tf'). Is it pretty flexible and I can use any kind of scaling that I want? (I'm using VGG for extracting mid level features - Conv4 block).

When looking at the source code of preprocess_input I see:

...
    if mode == 'tf':
        x /= 127.5
        x -= 1.
        return x
...

which suggests that for tensorflow backend (which is what keras is using), it should be scaled to [-1,1].

What I need to do is to create a function restore_original_image_from_array() which will take the img and reconstruct the original image that I fed in. The problem is I'm not sure how the scaling happens for VGG19.

So in short I would like to do:

img = image.load_img("./img.jpg", target_size=(256, 256))
scaled_img = preprocess_input(np.array(img))
restore_original_image_from_array(scaled_img) == np.array(img)
>>> True
GRS
  • 2,807
  • 4
  • 34
  • 72
  • If you are using the `preprocess_input()` function, isn't the scaling pretty clearly [-1, 1]? If you are seeing an output in the range [0,255], then you must be using a different function than what you posted. – Gabriel Ibagon May 04 '19 at 22:53
  • @GabrielIbagon That's the problem. That's what I expected, but instead it's scaling in some other way. `array([[[255, 255, 255], ... ` is mapped to `array([[[151.061 , 138.22101, 131.32 ], ... ` – GRS May 04 '19 at 22:57

2 Answers2

4

The "mode" of the preprocess_input function depends on the framework that the pretrained network weights were trained on. The VGG19 network in Keras uses the weights from the original VGG19 model in caffe, and for this reason, the argument in preprocess_input should be the default (mode='caffe'). See this question: Keras VGG16 preprocess_input modes

For your purposes, use the preprocess_input function that is found in keras.applications.vgg19 and reverse engineer it from there.

The original preprocessing is found here: https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py#L21

This involves 1) Converting the image(s) from RGB to BGR 2) Subtracting the dataset mean from the image(s)

Here is the code to restore the original image:

def restore_original_image_from_array(x, data_format='channels_first'):
    mean = [103.939, 116.779, 123.68]

    # Zero-center by mean pixel
    if data_format == 'channels_first':
        if x.ndim == 3:
            x[0, :, :] += mean[0]
            x[1, :, :] += mean[1]
            x[2, :, :] += mean[2]
        else:
            x[:, 0, :, :] += mean[0]
            x[:, 1, :, :] += mean[1]
            x[:, 2, :, :] += mean[2]
    else:
        x[..., 0] += mean[0]
        x[..., 1] += mean[1]
        x[..., 2] += mean[2]

    if data_format == 'channels_first':
        # 'BGR'->'RGB'
        if x.ndim == 3:
            x = x[::-1, ...]
        else:
            x = x[:, ::-1, ...]
    else:
        # 'BGR'->'RGB'
        x = x[..., ::-1]

    return x
Gabriel Ibagon
  • 412
  • 3
  • 9
  • Thanks, I see, so the it's using `caffe` which makes things difficult. Btw, thanks for pseudo inverse function, but it's incorrect after a quick comparison. – GRS May 04 '19 at 23:43
  • the conversion seems correct to me - did you compare the original and the output using `np.allclose(a, b)`? There may be small numerical imprecision depending on your input data type. Did you also correctly set the `data_format` parameter of the inverse function? – Gabriel Ibagon May 04 '19 at 23:47
  • Apologies, you are correct :). I blame it on the time of night. This is exactly what's needed. The only thing is why isn't there a tf version of vgg19 to simplify this whole process to standard scaling. – GRS May 04 '19 at 23:58
  • 1
    True, weights being in the caffe format makes preprocessing complicated. The Caffe weights are from the official release of the VGG19 model from Oxford - so it's useful for those trying to recreate the model exactly as it was published. Good luck with your work! – Gabriel Ibagon May 05 '19 at 00:08
1

VGG networks are trained on the image with each channel normalized by mean = [103.939, 116.779, 123.68]and with channels BGR. Furthermore, since our optimized image may take its values anywhere between −∞ and ∞ , we must clip to maintain our values from within the 0-255 range.
Here is the code to 'deprocess' or inverse process the processed image:

def deprocess_img(processed_img):
  x = processed_img.copy()
  if len(x.shape) == 4:
    x = np.squeeze(x, 0)
  assert len(x.shape) == 3, ("Input to deprocess image must be an image of "
                             "dimension [1, height, width, channel] or [height, width, channel]")
  if len(x.shape) != 3:
    raise ValueError("Invalid input to deprocessing image")
  
  # perform the inverse of the preprocessiing step
  x[:, :, 0] += 103.939
  x[:, :, 1] += 116.779
  x[:, :, 2] += 123.68
  x = x[:, :, ::-1]

  x = np.clip(x, 0, 255).astype('uint8')
  return x