Keras, Tensorflow : Merge two different model output into one

Question

I am working on one deep learning model where I am trying to combine two different model's output :

The overall structure is like this :

So the first model takes one matrix, for example [ 10 x 30 ]

#input 1
input_text          = layers.Input(shape=(1,), dtype="string")
embedding           = ElmoEmbeddingLayer()(input_text)
model_a             = Model(inputs = [input_text] , outputs=embedding)
                      # shape : [10,50]

Now the second model takes two input matrix :

X_in               = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,32])))
M_in               = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,10]))

md_1               = New_model()([X_in, M_in]) #new_model defined somewhere
model_s            = Model(inputs = [X_in, A_in], outputs = md_1)
                     # shape : [10,50]

I want to make these two matrices trainable like in TensorFlow I was able to do this by :

matrix_a = tf.get_variable(name='matrix_a',
                           shape=[10,10],
                           dtype=tf.float32,
                                 initializer=tf.constant_initializer(np.array(matrix_a)),trainable=True)

I am not getting any clue how to make those matrix_a and matrix_b trainable and how to merge the output of both networks then give input.

I went through this question But couldn't find an answer because their problem statement is different from mine.

What I have tried so far is :

#input 1
input_text          = layers.Input(shape=(1,), dtype="string")
embedding           = ElmoEmbeddingLayer()(input_text)
model_a             = Model(inputs = [input_text] , outputs=embedding)
                      # shape : [10,50]

X_in               = layers.Input(tensor=K.variable(np.random.uniform(0,9,[10,10])))
M_in               = layers.Input(tensor=K.variable(np.random.uniform(1,-1,[10,100]))

md_1               = New_model()([X_in, M_in]) #new_model defined somewhere
model_s            = Model(inputs = [X_in, A_in], outputs = md_1)
                    # [10,50]


#tranpose second model output

tranpose          = Lambda(lambda x: K.transpose(x))
agglayer          = tranpose(md_1)

# concat first and second model output
dott             = Lambda(lambda x: K.dot(x[0],x[1]))
kmean_layer     = dotter([embedding,agglayer])


# input 
final_model = Model(inputs=[input_text, X_in, M_in], outputs=kmean_layer,name='Final_output')
final_model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
final_model.summary()

Overview of the model :

Update:

Model b

X = np.random.uniform(0,9,[10,32])
M = np.random.uniform(1,-1,[10,10])


X_in = layers.Input(tensor=K.variable(X))
M_in = layers.Input(tensor=K.variable(M))



layer_one       = Model_b()([M_in, X_in])
dropout2       = Dropout(dropout_rate)(layer_one)
layer_two      = Model_b()([layer_one, X_in])

model_b_ = Model([X_in, M_in], layer_two, name='model_b')

model a

length = 150


dic_size = 100
embed_size = 12

input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)

embedding = LSTM(5)(embedding) 
embedding = Dense(10)(embedding)

model_a = Model(input_text, embedding, name = 'model_a')

I am merging like this:

mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, model_b_.output])



final_model = Model(inputs=[model_b_.input[0],model_b_.input[1],model_a.input], outputs=mult)

Is it right way to matmul two keras model?

I don't know if I am merging the output correctly and the model is correct.

I would greatly appreciate it if anyone kindly gives me some advice on how should I make that matrix trainable and how to merge the model's output correctly then give input.

Thanks in advance!

I didn't understand what matrix you want to be trainable. What is it? How does `matrix_a` participate in the code? It's not being used anywhere. I also don't understand the Elmo layer (lack of knowledge of mine), how is it expected to transform an input of shape `(1,)` into a `(10,30)`? What is that input and which format it is? Finally, did you consider the batch size in any of the given dimensions or are they just the dimensions of "one sample" of the batch? — Daniel Möller, Nov 17 '19 at 17:33
@DanielMöller Elmo dim is just for demo puprose, real dim is 1024. matrix_a and matrix_b are input to the second model, please see the figure. — Aaditya Ura, Nov 17 '19 at 17:43
So, do you want the inputs to the second model to be trainable matrices? This means they're not input data, is that correct? — Daniel Möller, Nov 17 '19 at 17:45
@DanielMöller yes they are not input data but I want to initialize with pre-trained weights. — Aaditya Ura, Nov 17 '19 at 19:29
Besides what you are feeding, do you really expect a categorical crossentropy (classification loss) for a 10x10 matrix? You should have one-dimensional outputs for categorical crossentropy — Daniel Möller, Nov 17 '19 at 21:04
(I know all these questions sound like being a pain, but I'm really working on an answer) — Daniel Möller, Nov 17 '19 at 21:05
For the least confusion, I think you should include the batch size in the dimensions of your picture. — Daniel Möller, Nov 17 '19 at 21:19
@DanielMöller I've also been asked to have a look at this - if you're working on it, I'll leave it to you; let me know if there's trouble (doubt it). Also, @ Aaditya, perfect [location](https://puu.sh/EFIA0/903fee1c41.png). — OverLordGoldDragon, Nov 17 '19 at 21:50
Well, I got the general code ready, but I must get the answer to my last comments, otherwise I can't wrap it up. — Daniel Möller, Nov 17 '19 at 21:55
@DanielMöller Sorry for the late reply, So second model is a custom model where I am feeding two matrices, both matrices will have the10x10 and 10x100, I just want the second network to learn those matrices. Then loss is one-hot encoding vector , Sorry I just used cross entroy I should use binary cross entropy. — Aaditya Ura, Nov 18 '19 at 04:02
@DanielMöller for second network there is no batch size, input will be same every time but I want to it to learn both matrix, and for model_a the batch size will be let say 128 then input for model_a will be 128x150 [ batch , max_sequence_length ] — Aaditya Ura, Nov 18 '19 at 04:04
So, just to finalize my questions, is `model_a` going to collapse the length 150 into 10? What batch shape is expected after model a? Is it `(128, 10, 50)`? — Daniel Möller, Nov 18 '19 at 12:04
@DanielMöller Adding one more image for clarification, for simplicity, let say first model is lstm model and second is just a dense layer but both are defined separately until the matmul happen. But in future, i will replace lstm with elmo so I want to make this architecture like that there I can replace first model with lstm or elmo — Aaditya Ura, Nov 18 '19 at 12:32

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Trainable weights

Ok. Since you are going to have custom trainable weights, the way to do this in Keras is creating a custom layer.

Now, since your custom layer has no inputs, we will need a hack that will be explained later.

So, this is the layer definition for the custom weights:

from keras.layers import *
from keras.models import Model
from keras.initializers import get as get_init, serialize as serial_init
import keras.backend as K
import tensorflow as tf


class TrainableWeights(Layer):

    #you can pass keras initializers when creating this layer
    #kwargs will take base layer arguments, such as name and others if you want
    def __init__(self, shape, initializer='uniform', **kwargs):
        super(TrainableWeights, self).__init__(**kwargs)
        self.shape = shape
        self.initializer = get_init(initializer)
        

    #build is where you define the weights of the layer
    def build(self, input_shape):
        self.kernel = self.add_weight(name='kernel', 
                                      shape=self.shape, 
                                      initializer=self.initializer, 
                                      trainable=True)
        self.built = True
        

    #call is the layer operation - due to keras limitation, we need an input
    #warning, I'm supposing the input is a tensor with value 1 and no shape or shape (1,)
    def call(self, x):
        return x * self.kernel
    

    #for keras to build the summary properly
    def compute_output_shape(self, input_shape):
        return self.shape
    

    #only needed for saving/loading this layer in model.save()
    def get_config(self):
        config = {'shape': self.shape, 'initializer': serial_init(self.initializer)}
        base_config = super(TrainableWeights, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Now, this layer should be used like this:

dummyInputs = Input(tensor=K.constant([1]))
trainableWeights = TrainableWeights(shape)(dummyInputs)

Model A

Having the layer defined, we can start modeling.
First, let's see the model_a side:

#general vars
length = 150
dic_size = 100
embed_size = 12

#for the model_a segment
input_text = Input(shape=(length,))
embedding = Embedding(dic_size, embed_size)(input_text)

#the following two lines are just a resource to reach the desired shape
embedding = LSTM(5)(embedding) 
embedding = Dense(50)(embedding)

#creating model_a here is optional, only if you want to use model_a independently later
model_a = Model(input_text, embedding, name = 'model_a')

Model B

For this, we are going to use our TrainableWeights layer.
But first, let's simulate a New_model() as mentioned.

#simulates New_model() #notice the explicit batch_shape for the matrices
newIn1 = Input(batch_shape = (10,10))
newIn2 = Input(batch_shape = (10,30))
newOut1 = Dense(50)(newIn1)
newOut2 = Dense(50)(newIn2)
newOut = Add()([newOut1, newOut2])
new_model = Model([newIn1, newIn2], newOut, name='new_model')

Now the entire branch:

#the matrices    
dummyInput = Input(tensor = K.constant([1]))
X_in = TrainableWeights((10,10), initializer='uniform')(dummyInput)
M_in = TrainableWeights((10,30), initializer='uniform')(dummyInput)

#the output of the branch   
md_1 = new_model([X_in, M_in])

#optional, only if you want to use model_s independently later
model_s = Model(dummyInput, md_1, name='model_s')

The whole model

Finally, we can join the branches in a whole model.
Notice how I didn't have to use model_a or model_s here. You can do it if you want, but those submodels are not needed, unless you want later to get them individually for other usages. (Even if you created them, you don't need to change the code below to use them, they're already part of the same graph)

#I prefer tf.matmul because it's clear and understandable while K.dot has weird behaviors
mult = Lambda(lambda x: tf.matmul(x[0], x[1], transpose_b=True))([embedding, md_1])

#final model
model = Model([input_text, dummyInput], mult, name='full_model')

Now train it:

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
model.fit(np.random.randint(0,dic_size, size=(128,length)),
          np.ones((128, 10)))

Since the output is 2D now, there is no problem about the 'categorical_crossentropy', my comment was because of doubts on the output shape.

Thank you very much, It's really helpful. But I have one confusion, suppose If I don't want to use that custom layer and direct matmul both model, How to do that? Just output from lstm and output from model b ( don't pass it to the train layer ) ? — Aaditya Ura, Nov 18 '19 at 14:02
Use `model_b.output` (or the tensor you used when you created `model_b = Model(...)`) in the matmul lambda. — Daniel Möller, Nov 18 '19 at 14:07
You can't have trainable weights outside custom layers, though. — Daniel Möller, Nov 18 '19 at 14:08
It's Ok... but put the "tensor" inputs at the end of the list, you are not passing them to fit. I think you need `model_b.inputs` not `input`. — Daniel Möller, Nov 18 '19 at 17:07
`Model(inputs=[model_a.input, model_b_.input[0],model_b_.input[1]], ...)` — Daniel Möller, Nov 18 '19 at 18:27
Daniel, Check this question, https://stackoverflow.com/questions/59134865/tensorflow-how-to-optimize-trained-model-size — Aaditya Ura, Dec 02 '19 at 08:16

Keras, Tensorflow : Merge two different model output into one

1 Answers1

Trainable weights

Model A

Model B

The whole model

Linked