Merge multiple CNN models

Question

I am trying to implement the paper Sarcasm Detection Using Deep Learning With Contextual Features.

This is the CNN architecture I'm trying to implement here:

This text is from the Paper itself that describes the layers:

The CNN architecture in Figure 5 is shown in a top-down manner starting from the start (top) to the finish (bottom) node. ‘‘NL’’ stands for N-gram Length. The breakdown is:

An input layer of size 1 × 100 × N where N is the number of instances from the dataset. Vectors of embedded-words are used as the initial input.

Then the layers between the input and the concatenation is introduced:

One convolutional layer with 200 neurons to receive and filter size 1 × 100 × N where N is the number of instances from the dataset. The stride is [1 1].

Two convolutional layer with 200 neurons to receive and filter size 1 × 100 × 200. The stride is [1 1].

Three batch normalization with 200 channels.

Three ReLU activation layers.

Three dropout layers with 20 percent dropout.

A max pooling layer with stride [1 1].

A depth concatenation layer to concatenate all the last max pooling layers.

A fully connected layer with ten neurons.

The code that I have tried so far is here.

model1 = Input((train_vector1.shape[1:]))
#1_1
model1 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)
#1_2
model1 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)
#1_3
model1 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model1)
model1 = BatchNormalization(200)(model1)
model1 = Dropout(0.2)(model1)

model1 = MaxPooling1D(strides=1)(model1)
model1 = Flatten()(model1)

## Second Part

model2 = Input((train_vector1.shape[1:]))
#2_1
model2 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)
#2_2
model2 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)
#2_3
model2 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model2)
model2 = BatchNormalization(200)(model2)
model2 = Dropout(0.2)(model2)

model2 = MaxPooling1D(strides=1)(model2)
model2 = Flatten()(model2)

## Third Part

model3 = Input((train_vector1.shape[1:]))
#3_1
model3 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)
#3_2
model3 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)
#3_3
model3 = Conv1D(200, filters = 200, kernel_size=(1, 100), stride = 1, activation = "relu")(model3)
model3 = BatchNormalization(200)(model3)
model3 = Dropout(0.2)(model3)

model3 = MaxPooling1D(strides=1)(model3)
model3 = Flatten()(model3)

concat_model = Concatenate()([model1, model2, model3])
output = Dense(10, activation='sigmoid')

I just want to know if my implementation is correct here, or am I misinterpreting something? Am I understanding what the author is trying to do here?

Everything seems right. But just define one input layer and use it for all 3, instead of defining 3 separate input layers. There might be differences in backprop if you do it this way. In the diagram, all 3 sides branch from the same input — Vishal Balaji, Jul 28 '22 at 08:08
Yes. They split 1-Gram, 2-Gram, 3-Gram. I don't know how to split the vector based on ngrams, I can give ngram_range = (1, 3) in TFIDF, but I don't know how I can split this into 3 inputs to 3 Layers — Fatin Faiaz Isty, Jul 28 '22 at 10:35

ClaudiaR · Accepted Answer · 2022-07-28T14:48:52.407

2

From that image I think that the input could be shared among the other layers. In that case you would have:

input = Input((train_vector1.shape[1:]))

model1 = Conv1D(...)(input)
# ...
model1 = Flatten()(model1)

model2 = Conv1D(...)(input)
# ...
model2 = Flatten()(model2)

model3 = Conv1D(...)(input)
# ...
model3 = Flatten()(model3)

concat_model = Concatenate()([model1, model2, model3])
output = Dense(10, activation='sigmoid')

Also most probably the convolutions are not 1D but 2D. You can get confirmation of it from the fact that it says:

The stride is [1 1]

Se we are in two dimensions. Same for MaxPooling.

Also you said:

when I run this code, it say too many arguments for "filters". Am I doing anything wrong here?

Let's take:

model1 = Conv1D(200, filters=train_vector1.shape[0], kernel_size=(1, 100), strides = 1, activation = "relu")(model1)

The Conv1D function accepts this arguments (full documentation):

tf.keras.layers.Conv1D(
    filters,
    kernel_size,
    strides=1,
    ...
)

It says too many arguments because you are trying to write the number of neurons of the Convolutional layer, but there is simply no argument for that, so you don't have to. The number of neurons depends on the other parameters you set.

Same thing also for BatchNormalization. From the docs:

tf.keras.layers.BatchNormalization(
    axis=-1,
    momentum=0.99,
    ...
)

There is no "number of neurons" argument.

edited Jul 28 '22 at 14:48

answered Jul 28 '22 at 08:09

ClaudiaR

3,108
2
13
27

I think they are using Stride = 1 but here mentioned [1 1]. I'll have a look. Also, when I run this code, it say too many arguments for "filters". Am I doing anything wrong here? – Fatin Faiaz Isty Jul 28 '22 at 10:33
yes, I hadn't noticed. I've updated the answer. @FatinFaiazIsty – ClaudiaR Jul 28 '22 at 13:08
Thanks for the update. By the way, the paper says "Convolutional Layer with 200 Neurons to receive and filter size 1x100xN". How would you code this in Conv1D? My Idea so far is Conv1D(200, kernel_size=(1, 100), activation="relu"). Will this be the right assumption? – Fatin Faiaz Isty Jul 28 '22 at 13:42
Also I think they ARE using Conv2D. Otherwise why would the Filter Size and Stride look like this? – Fatin Faiaz Isty Jul 28 '22 at 13:43
Yes, I think it should be `Conv2D`. But I'm pretty sure having filters=200 is not right, that is the number of neurons. Try printing the architecture of your model, while changing the shapes of the filters and kernel size, to find something similar to what described in the article. To see the architecture you only have to write `model.summary()`. Also I noticed another mistake, I've updated the answer. @FatinFaiazIsty – ClaudiaR Jul 28 '22 at 14:47
1

Okay I'll have a look. Thank you for the answers. Upvoted and accepted – Fatin Faiaz Isty Jul 28 '22 at 18:42

Merge multiple CNN models

1 Answers1