1

I'm trying to retrain (read finetune) a MobileNet image Classifier.

The script for retraining given by tensorflow here (from the tutorial), updates only the weights of the newly added fully connected layer. I modified this script to update weights of all the layers of the pre-trained model. I'm using MobileNet architecture with depth multiplier of 0.25 and input size of 128.

However while retraining I obsereved a strange thing, if I give a particular image as input for inference in a batch with some other images, the activation values after some layers are different from those when the image is passed alone. Also activation values for same image from different batches are different. Example - For two batches - batch_1 : [img1, img2, img3]; batch_2 : [img1, img4, img5]. The activations for img1 are different from both the batches.

Here is the code I use for inference -

for tf.Session(graph=tf.get_default_graph()) as sess:
    image_path = '/tmp/images/10dsf00003.jpg'
    id_ = gfile.FastGFile(image_path, 'rb').read()

    #The line below loads the jpeg using tf.decode_jpeg and does some preprocessing
    id = sess.run(decoded_image_tensor, {jpeg_data_tensor: id_})

    input_image_tensor = graph.get_tensor_by_name('input')

    layerXname='MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu:0' #Name of the layer whose activations to inspect.
    layerX = graph.get_tensor_by_name(layerXname)
    layerXactivations=sess.run(layerX, {input_image_tensor: id})

The above code is executed once as it is and once with the following change in the last line:

layerXactivations_batch=sess.run(layerX, {input_image_tensor: np.asarray([np.squeeze(id), np.squeeze(id), np.squeeze(id)])})

Following are some nodes in the graph :

[u'input',  u'MobilenetV1/Conv2d_0/weights',  u'MobilenetV1/Conv2d_0/weights/read',  u'MobilenetV1/MobilenetV1/Conv2d_0/convolution',  u'MobilenetV1/Conv2d_0/BatchNorm/beta',  u'MobilenetV1/Conv2d_0/BatchNorm/beta/read',  u'MobilenetV1/Conv2d_0/BatchNorm/gamma',  u'MobilenetV1/Conv2d_0/BatchNorm/gamma/read',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean/read',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance',  u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance/read',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add/y',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/Rsqrt',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_2',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/sub',  u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add_1',  u'MobilenetV1/MobilenetV1/Conv2d_0/Relu6',  u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights',  u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read',   ...  ...]

Now when layerXname = 'MobilenetV1/MobilenetV1/Conv2d_0/convolution' The activations are same in both of the above specified cases. (i.e. layerxactivations and layerxactivations_batch[0] are same). But after this layer, all layers have different activation values. I feel that the batchNorm operations after 'MobilenetV1/MobilenetV1/Conv2d_0/convolution' layer behave differently for batch inputs and a single image. Or is the issue caused by something else ?

Any help/pointers would be appreciated.

Krist
  • 477
  • 4
  • 17

2 Answers2

0

When you build the mobilenet there is one parameter called is_training. If you don't set it to false the dropout layer and the batch normalization layer will give you different results in different iterations. Batch normalization will probably change very little the values but dropout will change them a lot as it drops some input values.

Take a look to the signature of mobilnet:

def mobilenet_v1(inputs,
                 num_classes=1000,
                 dropout_keep_prob=0.999,
                 is_training=True,
                 min_depth=8,
                 depth_multiplier=1.0,
                 conv_defs=None,
                 prediction_fn=tf.contrib.layers.softmax,
                 spatial_squeeze=True,
                 reuse=None,
                 scope='MobilenetV1'):
  """Mobilenet v1 model for classification.

  Args:
    inputs: a tensor of shape [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    dropout_keep_prob: the percentage of activation values that are retained.
    is_training: whether is training or not.
    min_depth: Minimum depth value (number of channels) for all convolution ops.
      Enforced when depth_multiplier < 1, and not an active constraint when
      depth_multiplier >= 1.
    depth_multiplier: Float multiplier for the depth (number of channels)
      for all convolution ops. The value must be greater than zero. Typical
      usage will be to set this value in (0, 1) to reduce the number of
      parameters or computation cost of the model.
    conv_defs: A list of ConvDef namedtuples specifying the net architecture.
    prediction_fn: a function to get predictions out of logits.
    spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
        of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.

  Returns:
    logits: the pre-softmax activations, a tensor of size
      [batch_size, num_classes]
    end_points: a dictionary from components of the network to the corresponding
      activation.

  Raises:
    ValueError: Input rank is invalid.
  """
jorgemf
  • 1,123
  • 8
  • 13
  • Thanks @jorgemf ! As I suspected in my question the problem was with batchNorm and setting `is_training` to `False` works. But this is not the correct way. Ideally the graph should be loaded with `is_training` as `True` while training and then `is_training` as False while inference. But since here I have not written the batchnorm myself and rather the graph is loaded from MobileNet code; I'm yet to figure out how to do the above. You can refer here - https://stackoverflow.com/questions/39353503/tensorflow-tf-slim-model-with-is-training-true-and-false or https://ruishu.io/2016/12/27/batchnorm/ – Krist Sep 27 '17 at 06:26
  • @Krist don't forget to mark the answer as valid if it helped you. – jorgemf Sep 27 '17 at 06:44
-1

This is due to Batch Normalisation.

How are you running inference. Are you loading it from the checkpoint files or are you using a Frozen Protobuf model. If you use a frozen model you can expect similar results for different formats of inputs.

Check this out. A similar issue for a different application is raised here.

Anand C U
  • 885
  • 9
  • 29
  • I don't think he is frozen the graph or that the issue you linked is due to a frozen graph. – jorgemf Sep 26 '17 at 17:53
  • I said it's due to Batch Normalisation. When you freeze the graph the operation Moving Mean/Average is changed and it gives predictable results. – Anand C U Sep 27 '17 at 04:30
  • 1
    Now I see it it could be either batch_norm or the dropout layer. Both take the is_training parameter – jorgemf Sep 27 '17 at 04:52
  • Which is why i suggested the idea of Freezing the graph, which is the correct way to run inference – Anand C U Sep 27 '17 at 05:54
  • Freezing the graph is over complicating the things for someone that is starting. It is better to set the `is_training` parameter to false so the batch_norm and dropout will not change the outputs with same inputs. – jorgemf Sep 27 '17 at 06:14
  • @jorgemf is right. Setting `is_training` to `False` works here. Also I've already tried freezing graph the problem persists as even after the freezing the batchNorm layer behaves the same way. Hence freezing graph doesn't really solves the problem. – Krist Sep 27 '17 at 06:28
  • @Krist Did you use [export_inference_graph](https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py) tool to Freeze the model? Setting is_training to False is done by default by this tool and also takes care of other parameters like input nodes, output nodes, etc. Setting is_training to False only solves part of the problem. – Anand C U Sep 27 '17 at 06:59