0

I am building a model for human face segmentation into skin and non-skin area. As a model, I am using the model/method shown here as a starting point and adding a dense layer at the end with sigmoid activation. The model works very well for my purpose, giving good dice metric score. The model uses 2 pre-trained layers from Resnet50 as a model backbone for feature detection. I have read several articles, books and code but couldn't find any information on how to determine which layer to choses for feature extraction. I compared the Resnet50 architecture with Xception and picked up two similar layers, replaced the layer in the original network (here) and ran the training. I got similar results, not better not worse. I have the following questions

  1. How to determine which layer is responsible for low-level/high-level features?
  2. Does using only pre-trained layers any better than using full pre-trained networks in terms of training time and the number of trainable parameters?
  3. where can I find more information about using only layers from pre-trained networks?

here is the code for quick over-view

def DeeplabV3Plus(image_size, num_classes):
    model_input = keras.Input(shape=(image_size, image_size, 3))
    resnet50 = keras.applications.ResNet50(
        weights="imagenet", include_top=False, input_tensor=model_input)
    x = resnet50.get_layer("conv4_block6_2_relu").output
    x = DilatedSpatialPyramidPooling(x)

    input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
    input_b = resnet50.get_layer("conv2_block3_2_relu").output
    input_b = convolution_block(input_b, num_filters=48, kernel_size=1)

    x = layers.Concatenate(axis=-1)([input_a, input_b])
    x = convolution_block(x)
    x = convolution_block(x)
    x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]), interpolation="bilinear")(x)
    model_output = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
    return keras.Model(inputs=model_input, outputs=model_output)

And here is my modified code using Xception layers as the backbone

def DeeplabV3Plus(image_size, num_classes):
  model_input = keras.Input(shape=(image_size, image_size, 3))
 
  Xception_model = keras.applications.Xception(
              weights="imagenet", include_top=False, input_tensor=model_input)
  xception_x1 = Xception_model.get_layer("block9_sepconv3_act").output
  x = DilatedSpatialPyramidPooling(xception_x1)

  input_a = layers.UpSampling2D(size=(image_size // 4 // x.shape[1], image_size // 4 // x.shape[2]), interpolation="bilinear")(x)
  input_a = layers.AveragePooling2D(pool_size=(2, 2))(input_a)
  xception_x2 = Xception_model.get_layer("block4_sepconv1_act").output
  input_b = convolution_block(xception_x2, num_filters=256, kernel_size=1)

  x = layers.Concatenate(axis=-1)([input_a, input_b])
  x = convolution_block(x)
  x = convolution_block(x)
  x = layers.UpSampling2D(size=(image_size // x.shape[1], image_size // x.shape[2]),interpolation="bilinear")(x)
  x = layers.Conv2D(num_classes, kernel_size=(1, 1), padding="same")(x)
  model_output = layers.Dense(x.shape[2], activation='sigmoid')(x)
  return keras.Model(inputs=model_input, outputs=model_output)

Thanks in advance!

FARAZ SHAIKH
  • 95
  • 1
  • 1
  • 7

1 Answers1

1
  1. In general, the first layers (the ones closer to the input) are the one responsible for learning high-level features, whereas the last layers are more dataset/task-specific. This is the reason why, when transfer learning, you usually want to delete only the last few layers to replace them with others which can deal with your specific problem.
  2. It depends. Transfering the whole network, without deleting nor adding any new layer, basically means that the network won't learn anything new (unless you are not freezing the layers - in that case you are fine tuning). On the other hand, if you delete some layers and add a few more, than you the number of trainable parameters only depend on the new layers you just added.

What I suggest you to do is:

  1. Delete a few layers from a pre-trained network, freeze these layers and add a few more layers (even just one)
  2. Train the new network with a certain learning rate (usually this learning rate is not very low)
  3. Fine Tune!: unfreeze all the layers, lower the learning rate, and re-train the whole network
  • "Delete a few layers from a pre-trained network, freeze these layers and add a few more layers (even just one)"- I am building a different network that uses a pre-trained network as the backbone(link to explain backbone is put in question) and not build on top of existing pre-trained model. The new model is altogether differently functional, it just uses layers of a pre-trained network for feature extraction. Again, my question is how do I know which layer/layers of the pre-trained network to use? – FARAZ SHAIKH Apr 15 '22 at 10:48