3

Environment:

Im using TF.Keras (Tensorflow 1.14) on Google Colab, and my model architecture is MobileNet V2 1.00 224.

Problem:

I am trying (and failing) to attach a new layer and make a new output to an existing layer that is not the normal output of my Model. I.e make a branch earlier in MobileNet V2

I want this new branch to be for a regression output - but I dont want that output to serially connected off of the final embedding layer of MobileNet, but a much earlier stage (which one - im not sure, im experimenting). Basically a branch with its own output, and then the normal, pre-trained image net embedding out.

Grab MobileNet V2 as base_model:

  base_model = tf.keras.applications.MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3),
                                                include_top=False,
                                                weights='imagenet')

  base_model.trainable = False

Make my layers from base_model and make my new outputs.

  # get layers from mobilenet base layer
  mobilenet_input = base_model.get_layer('input_1')
  mobilenet_output = base_model.get_layer('out_relu')

  # add our average pooling layer to our MobileNetV2 output like all of our other classifiers so we split our graph on the same nodes
  out_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name='embedding_pooling')(mobilenet_output.output)
  out_global_pooling.trainable = False

  # Our new branch and outputs for the branch
  expanded_conv_depthwise_BN = base_model.get_layer('expanded_conv_depthwise_BN')
  regression_dropout = tf.keras.layers.Dropout(0.5) (expanded_conv_depthwise_BN.output)
  regression_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name="regression_pooling")(regression_dropout)
  new_regression_output = tf.keras.layers.Dense(num_labels, activation = 'sigmoid', name = "cinemanet_output") (regression_global_pooling)

This appears to be fine, and I can even make my model via the functional API:

  model = tf.keras.Model(inputs=mobilenet_input.input, outputs=[out_global_pooling, new_regression_output])

My Training Code

My data set is a set of 30 floats (10 RGB duplets) I want to predict from an input image. My data set functions when training a 'sequence' model, but fails when I try to train this model.

 ops.reset_default_graph()
  tf.keras.backend.set_learning_phase(1) # 0 testing, 1 training mode

# preview contents of CSV to verify things are sane
  import csv
  import math

  def lenopenreadlines(filename):
      with open(filename) as f:
          return len(f.readlines())

  def csvheaderrow(filename):
    with open(filename) as f:
      reader = csv.reader(f)
      return next(reader, None)

  # !head {label_file}

  NUM_IMAGES = ( lenopenreadlines(label_file) - 1) # remove header

  COLUMN_NAMES = csvheaderrow(label_file)
  
  LABEL_NAMES = COLUMN_NAMES[:]
  LABEL_NAMES.remove("filepath")

  ALL_LABELS.extend(LABEL_NAMES)

  # make our data set
  BATCH_SIZE = 256
  NUM_EPOCHS = 50
  FILE_PATH = ["filepath"]
  
  LABELS_TO_PRINT = ' '.join(LABEL_NAMES)
  print("Label contains: " + str(NUM_IMAGES) + " images")
  print("Label Are: " + LABELS_TO_PRINT)
  print("Creating Data Set From " + label_file)

  csv_dataset = get_dataset(label_file, BATCH_SIZE, NUM_EPOCHS, COLUMN_NAMES)

  #make a new data set from our csv by mapping every value to the above function
  split_dataset = csv_dataset.map(split_csv_to_path_and_labels)  

  # make a new datas set that loads our images from the first path 
  image_and_labels_ds = split_dataset.map(load_and_preprocess_image_batch, num_parallel_calls=AUTOTUNE)

  # update our image floating point range to match -1, 1
  ds = image_and_labels_ds.map(change_range)
  
  print(image_and_labels_ds)

  model = build_model(LABEL_NAMES, use_masked_loss)

  #split the final data set into train / validation splits to use for our model.
  DATASET_SIZE = NUM_IMAGES

  ds = ds.repeat()


  steps_per_epoch =  int(math.floor(DATASET_SIZE/BATCH_SIZE))
  history = model.fit(ds, epochs=NUM_EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[TensorBoardColabCallback(tbc)])


  print(history)

  # results = model.evaluate(test_dataset)
  # print('test loss, test acc:', results)
  export_model(model, model_name, LABEL_NAMES, date)

ValueError: Error when checking model target: 
the list of Numpy arrays that you are passing to your model is not the size the model expected.

Expected to see 2 array(s), but instead got the following list of 1 arrays:
[<tf.Tensor 'IteratorGetNext:1' shape=(?, 30) dtype=float32>]

If I instead use a Sequence and naively try to train my regression task against final output of mobile net (rather than the branch) - training works fine (although I get poor results).

My Model summary appears to tell me things are wired as I expect. My dropout is connected to expanded_conv_depthwise_BN. My regression pooling is connected to my drop out and my output layer appears in the summary connected to my regressing pooling


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D)       (None, 225, 225, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 112, 112, 32) 864         Conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 112, 112, 32) 128         Conv1[0][0]                      
__________________________________________________________________________________________________
Conv1_relu (ReLU)               (None, 112, 112, 32) 0           bn_Conv1[0][0]                   
__________________________________________________________________________________________________
expanded_conv_depthwise (Depthw (None, 112, 112, 32) 288         Conv1_relu[0][0]                 
__________________________________________________________________________________________________
expanded_conv_depthwise_BN (Bat (None, 112, 112, 32) 128         expanded_conv_depthwise[0][0]    
__________________________________________________________________________________________________
expanded_conv_depthwise_relu (R (None, 112, 112, 32) 0           expanded_conv_depthwise_BN[0][0] 
__________________________________________________________________________________________________
expanded_conv_project (Conv2D)  (None, 112, 112, 16) 512         expanded_conv_depthwise_relu[0][0
__________________________________________________________________________________________


< snip for brevity >

________
block_16_project (Conv2D)       (None, 7, 7, 320)    307200      block_16_depthwise_relu[0][0]    
__________________________________________________________________________________________________
block_16_project_BN (BatchNorma (None, 7, 7, 320)    1280        block_16_project[0][0]           
__________________________________________________________________________________________________
Conv_1 (Conv2D)                 (None, 7, 7, 1280)   409600      block_16_project_BN[0][0]        
__________________________________________________________________________________________________
Conv_1_bn (BatchNormalization)  (None, 7, 7, 1280)   5120        Conv_1[0][0]                     
__________________________________________________________________________________________________
dropout (Dropout)               (None, 112, 112, 32) 0           expanded_conv_depthwise_BN[0][0] 
__________________________________________________________________________________________________
out_relu (ReLU)                 (None, 7, 7, 1280)   0           Conv_1_bn[0][0]                  
__________________________________________________________________________________________________
regression_pooling (GlobalAvera (None, 32)           0           dropout[0][0]                    
__________________________________________________________________________________________________
embedding_pooling (GlobalAverag (None, 1280)         0           out_relu[0][0]                   
__________________________________________________________________________________________________
cinemanet_output (Dense)        (None, 30)           990         regression_pooling[0][0]         
==================================================================================================
Total params: 2,258,974
Trainable params: 990
Non-trainable params: 2,257,984
Community
  • 1
  • 1
vade
  • 702
  • 4
  • 22
  • 1
    Can you post the code for training? – thushv89 Dec 04 '19 at 22:29
  • Sure! Ill edit the main question. – vade Dec 04 '19 at 22:37
  • 1
    So this is interesting: if I remove my first output (the output I dont want to train), everything works. It appears my data set needs to output 2 tensors for each output, even though I only want to train one. – vade Dec 04 '19 at 23:23
  • 1
    https://stackoverflow.com/questions/42785433/keras-training-only-specific-outputs seems like the thing to do, since I want the entire network (if I specify only the linear regression network the latter layers aren't included from mobile net, which I want). – vade Dec 04 '19 at 23:33

1 Answers1

1

It looks like you are setting things up correctly, but your training dataset doesn't include tensors for both outputs. If you only want to train the new output, you can provide dummy tensors (or even real training data) for the other one while using a loss weight of 0 to prevent the parameters from updating. That should also prevent any parameters that are not directly "upstream" of the new output layer from updating during training.

When compiling your model, use the argument loss_weights to pass the weights as either a list (e.g., loss_weights=[0, 1]) or a dictionary (e.g., loss_weights={'out_relu': 0, 'cinemanet_output': 1}).

Nate
  • 26
  • 4
  • Oh thats an awesome way of handling it. I ended up doing a slightly different approach that feels more hacky, which is outlined in this SO question https://stackoverflow.com/questions/42785433/keras-training-only-specific-outputs Where I make two models one with the output I want to train, the other with both, and end up saving the latter since it ends up having the same layers. Your above method looks way better though. Ill mark as the answer and give it a shot, much obliged! – vade Dec 05 '19 at 04:36