I have written the following multi-input Keras TensorFlow model:
CHARPROTLEN = 25 #size of vocab
CHARCANSMILEN = 62 #size of vocab
protein_input = Input(shape=(train_protein.shape[1:]))
compound_input = Input(shape=(train_smile.shape[1:]))
#protein layers
x = Embedding(input_dim=CHARPROTLEN+1,output_dim=128, input_length=maximum_amino_acid_sequence_length) (protein_input)
x = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(x)
x = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=8)(x)
x = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=12)(x)
final_protein = GlobalMaxPooling1D()(x)
#compound layers
y = Embedding(input_dim=CHARCANSMISET+1,output_dim=128, input_length=maximum_SMILES_length) (compound_input)
y = Conv1D(filters=32, padding="valid", activation="relu", strides=1, kernel_size=4)(y)
y = Conv1D(filters=64, padding="valid", activation="relu", strides=1, kernel_size=6)(y)
y = Conv1D(filters=96, padding="valid", activation="relu", strides=1, kernel_size=8)(y)
final_compound = GlobalMaxPooling1D()(y)
join = tf.keras.layers.concatenate([final_protein, final_compound], axis=-1)
x = Dense(1024, activation="relu")(join)
x = Dropout(0.1)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(512, activation='relu')(x)
predictions = Dense(1,kernel_initializer='normal')(x)
model = Model(inputs=[protein_input, compound_input], outputs=[predictions])
The inputs have the following shapes:
train_protein.shape
TensorShape([5411, 1500, 1])
train_smile.shape
TensorShape([5411, 100, 1])
I get the following error message:
ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv1d. Consider increasing the input size. Received input shape [None, 1500, 1, 128] which would produce output shape with a zero or negative value in a dimension.
Is this due to the Embedding
layer having the incorrect output_dim
? How do I correct this? Thanks.