I want to create a Tensorflow neural network model using the Functional API, but I'm not sure how to separate the input into two. I wanted to do something like: given an input, its first half goes to the first part of the neural network, its second half goes to the second part, and each input is passed through the layers until they concatenate, go through another layer and finally reach the output. I thought of something like the snippet of code below, along with a quick sketch.
from tensorflow.keras.layers import Dense
def define_model(self):
input1 = tf.keras.Input(shape=(4,)) #input is a 1D vector containing 7 elements, split as 4 and 3
input2 = tf.keras.Input(shape=(3,))
layer1_1 = Dense(4, activation=tf.nn.leaky_relu)(input1)
layer2_1 = Dense(4, activation=tf.nn.leaky_relu)(layer1_1)
layer1_2 = Dense(4, activation=tf.nn.leaky_relu)(input2)
layer2_2 = Dense(3, activation=tf.nn.leaky_relu)(layer1_2)
concat_layer = tf.keras.concatenate([layer2_1,layer2_2], axis = 0)
layer3 = Dense(6, activation=tf.nn.leaky_relu)(concat_layer)
output = Dense(4)(layer3) #no activation
self.model = tf.keras.Model(inputs = [input1,input2],outputs = output)
self.model.compile(loss = 'mean_squared_error', optimizer = 'rmsprop')
return self.model
First of all, should I add any Dropout or BatchNormalization layers in this model?
Also, the first 4 elements of the input array are binary (like [1,0,0,1] or [0,1,1,1]), while the other 3 can be any real number. Should I treat the 1st "column" of the neural network differently than the 2nd one, given that the first operates with inputs in the 0<x<1 range, while the 2nd one doesn't?
It sounds right, but I can't really test if it should work or not, as I would have to rework A LOT of the code to generate enough data to train it. Am I going in the right direction or should I be doing something different? Would this code work at all?
EDIT: I'm having issues during training. Suppose that I want to train the model like this (the values don't matter all that much, what's important is the data type):
#this snippet generates training data - nothing real, just test examples. Also, I changed the output layer from 4 elements to just 1 to test it.
A1=[np.array([[1.,0,0,1]]),np.array([[0,1.,0]])]
B1=np.array([7])
c=np.array([[5,-4,1,-1],[2,3,-1]], dtype = object)
A2 = [[np.random.randint(2, size= [1,4]),np.random.randint(2, size= [1,3])] for i in range(1000)]
B2 = np.array([np.sum(A[i][0]*c[0])+np.sum(A[i][1]*c[1]) for i in range(1000)])
model.fit(A1,B1, epochs = 50, verbose=False) #this works!
model.fit(A2,B2, epochs = 50, verbose=False) #but this doesn't.
FINAL EDIT: here are the predict() and predict_on_batch() functions.
def predict(a,b):
pred = m.predict([a,b])
return pred
def predict_b(c,d):
preds = m.predict_on_batch([c,d])
return preds
#a, b, c and d must look like this:
a = [np.array([0,1,0,1])]
b = [np.array([0,0,1])]
c = [np.array([1, 0, 0, 1]),
np.array([0, 1, 1, 1]),
np.array([0, 1, 0, 0]),
np.array([1, 0, 0, 0]),
np.array([0, 0, 1, 0])]
d = [np.array([1, 0, 1]),
np.array([0, 0, 1]),
np.array([0, 1, 1]),
np.array([1, 1, 1]),
np.array([0, 0, 0])]
#notice that all of those should follow the same pattern, which is a list of arrays.
The rest of the code is under M. Innat's answer.