11

As described in figure 1, I have 3 models which each apply to a particular domain.

The 3 models are trained separately with different datasets. enter image description here

And inference is sequential :

enter image description here

I tried to parallelize the call of these 3 models thanks to the Multiprocess library of python but it is very unstable and it is not advised.

Here's the idea I got to make sure to do this all at once:

As the 3 models share a common pretrained-model, I want to make a single model that has multiple inputs and multiple outputs.

As the following drawing shows: enter image description here

Like that during the inference, I will call a single model which will do all 3 operations at the same time.

enter image description here

I saw that with The Functional API of KERAS, it is possible but I have no idea how to do that. The inputs of the datasets have the same dimension. These are pictures of (200,200,3).

If anyone has an example of a Multi-Input Multi-output model that shares a common structure, I'm all ok.

UPADE

Here is the example of my code but it returns an error because of the layers. concatenate (...) line which propagates a shape that is not taken into account by the EfficientNet model.

age_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), name="age_inputs")
    
gender_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3)
                               , name="gender_inputs")
    
emotion_inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), 
                                name="emotion_inputs")


inputs = layers.concatenate([age_inputs, gender_inputs, emotion_inputs])
inputs = layers.Conv2D(3, (3, 3), activation="relu")(inputs)    
model = EfficientNetB0(include_top=False, 
                   input_tensor=inputs, weights="imagenet")
    

model.trainable = False

inputs = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
inputs = layers.BatchNormalization()(inputs)

top_dropout_rate = 0.2
inputs = layers.Dropout(top_dropout_rate, name="top_dropout")(inputs)

age_outputs = layers.Dense(1, activation="linear", 
                          name="age_pred")(inputs)
gender_outputs = layers.Dense(GENDER_NUM_CLASSES, 
                              activation="softmax", 
                              name="gender_pred")(inputs)
emotion_outputs = layers.Dense(EMOTION_NUM_CLASSES, activation="softmax", 
                             name="emotion_pred")(inputs)

model = keras.Model(inputs=[age_inputs, gender_inputs, emotion_inputs], 
              outputs =[age_outputs, gender_outputs, emotion_outputs], 
              name="EfficientNet")

optimizer = keras.optimizers.Adam(learning_rate=1e-2)
model.compile(loss={"age_pred" : "mse", 
                   "gender_pred":"categorical_crossentropy", 
                    "emotion_pred":"categorical_crossentropy"}, 
                   optimizer=optimizer, metrics=["accuracy"])

(age_train_images, age_train_labels), (age_test_images, age_test_labels) = reg_data_loader.load_data(...)
(gender_train_images, gender_train_labels), (gender_test_images, gender_test_labels) = cat_data_loader.load_data(...)
(emotion_train_images, emotion_train_labels), (emotion_test_images, emotion_test_labels) = cat_data_loader.load_data(...)

 model.fit({'age_inputs':age_train_images, 'gender_inputs':gender_train_images, 'emotion_inputs':emotion_train_images},
         {'age_pred':age_train_labels, 'gender_pred':gender_train_labels, 'emotion_pred':emotion_train_labels},
                 validation_split=0.2, 
                       epochs=5, 
                            batch_size=16)
Innat
  • 16,113
  • 6
  • 53
  • 101
Kibs J.
  • 133
  • 1
  • 1
  • 9

1 Answers1

17

We can do that easily in tf. keras using its awesome Functional API. Here we will walk you through how to build multi-out with a different type (classification and regression) using Functional API.

According to your last diagram, you need one input model and three outputs of different types. To demonstrate, we will use MNIST which is a handwritten dataset. It's normally a 10 class classification problem data set. From it, we will create an additionally 2 class classifier (whether a digit is even or odd) and also a 1 regression part (which is to predict the square of a digit, i.e for image input of 9, it should give approximately it's square).


Data Set

import numpy as np 
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(xtrain, ytrain), (_, _) = keras.datasets.mnist.load_data()

# 10 class classifier 
y_out_a = keras.utils.to_categorical(ytrain, num_classes=10) 

# 2 class classifier, even or odd 
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2) 

# regression, predict square of an input digit image
y_out_c = tf.square(tf.cast(ytrain, tf.float32))

So, our training pairs will be xtrain and [y_out_a, y_out_b, y_out_c], the same as your last diagram.


Model Building

Let's build the model accordingly using the Functional API of tf. keras. See the model definition below. The MNIST samples are a 28 x 28 grayscale image. So our input is set in that way. I'm guessing your data set is probably RGB, so change the input dimension accordingly.

input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
x = layers.GlobalMaxPooling2D()(x)

out_a = keras.layers.Dense(10, activation='softmax', name='10cls')(x)
out_b = keras.layers.Dense(2, activation='softmax', name='2cls')(x)
out_c = keras.layers.Dense(1, activation='linear', name='1rg')(x)

encoder = keras.Model( inputs = input, outputs = [out_a, out_b, out_c], name="encoder")
# Let's plot 
keras.utils.plot_model(
    encoder
)

enter image description here

One thing to note, while defining out_a, out_b, and out_c during model definition we set their name variable which is very important. Their names are set '10cls', '2cls', and '1rg' respectively. You can also see this from the above diagram (last 3 tails).


Compile and Run

Now, we can see why that name variable is important. In order to run the model, we need to compile it first with the proper loss function, metrics, and optimizer. Now, if you know that, for the classification and regression problem, the optimizer can be the same but for the loss function and metrics should be changed. And in our model, which has a multi-type output model (2 classifications and 1 regression), we need to set proper loss and metrics for each of these types. Please, see below how it's done.

encoder.compile(
    loss = {
        "10cls": tf.keras.losses.CategoricalCrossentropy(),
        "2cls": tf.keras.losses.CategoricalCrossentropy(),
        "1rg": tf.keras.losses.MeanSquaredError()
    },

    metrics = {
        "10cls": 'accuracy',
        "2cls": 'accuracy',
        "1rg": 'mse'
    },

    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)

See, each last output of our above model, which is here represented by their name variables. And we set proper compilation to them. Hope you understand this part. Now, time to train the model.

encoder.fit(xtrain, [y_out_a, y_out_b, y_out_c], epochs=30, verbose=2)

Epoch 1/30
1875/1875 - 6s - loss: 117.7318 - 10cls_loss: 3.2642 - 4cls_loss: 0.9040 - 1rg_loss: 113.5637 - 10cls_accuracy: 0.6057 - 4cls_accuracy: 0.8671 - 1rg_mse: 113.5637
Epoch 2/30
1875/1875 - 5s - loss: 62.1696 - 10cls_loss: 0.5151 - 4cls_loss: 0.2437 - 1rg_loss: 61.4109 - 10cls_accuracy: 0.8845 - 4cls_accuracy: 0.9480 - 1rg_mse: 61.4109
Epoch 3/30
1875/1875 - 5s - loss: 50.3159 - 10cls_loss: 0.2804 - 4cls_loss: 0.1371 - 1rg_loss: 49.8985 - 10cls_accuracy: 0.9295 - 4cls_accuracy: 0.9641 - 1rg_mse: 49.8985


Epoch 28/30
1875/1875 - 5s - loss: 15.5841 - 10cls_loss: 0.1066 - 4cls_loss: 0.0891 - 1rg_loss: 15.3884 - 10cls_accuracy: 0.9726 - 4cls_accuracy: 0.9715 - 1rg_mse: 15.3884
Epoch 29/30
1875/1875 - 5s - loss: 15.2199 - 10cls_loss: 0.1058 - 4cls_loss: 0.0859 - 1rg_loss: 15.0281 - 10cls_accuracy: 0.9736 - 4cls_accuracy: 0.9727 - 1rg_mse: 15.0281
Epoch 30/30
1875/1875 - 5s - loss: 15.2178 - 10cls_loss: 0.1136 - 4cls_loss: 0.0854 - 1rg_loss: 15.0188 - 10cls_accuracy: 0.9722 - 4cls_accuracy: 0.9736 - 1rg_mse: 15.0188
<tensorflow.python.keras.callbacks.History at 0x7ff42c18e110>

That's how each of the outputs of the last layer optimizes by their concern loss function. FYI, one thing to mention, there is an essential parameter while .compile the model which you might need: loss_weights - to weight the loss contributions of different model outputs. See my other answer here on this.


Prediction / Inference

Let's see some output. We now hope this model will predict 3 things: (1) is what the digit is, (2) is it even or odd, and (3) its square value.

import matplotlib.pyplot as plt
plt.imshow(xtrain[0])

enter image description here

If we like to quickly check the output layers of our model

encoder.output

[<KerasTensor: shape=(None, 10) dtype=float32 (created by layer '10cls')>,
 <KerasTensor: shape=(None, 2) dtype=float32 (created by layer '4cls')>,
 <KerasTensor: shape=(None, 1) dtype=float32 (created by layer '1rg')>]

Passing this xtrain[0] (which we know 5) to the model to do predictions.

# we expand for a batch dimension: (1, 28, 28, 1)
pred10, pred2, pred1 = encoder.predict(tf.expand_dims(xtrain[0], 0))

# regression: square of the input dgit image 
pred1 
array([[22.098022]], dtype=float32)

# even or odd, surely odd 
pred2.argmax()
0

# which number, surely 5
pred10.argmax()
5

Update

Based on your comment, we can extend the above model to take multi-input too. We need to change things. To demonstrate, we will use train and test samples of the mnist data set to the model as a multi-input.

(xtrain, ytrain), (xtest, _) = keras.datasets.mnist.load_data()

xtrain = xtrain[:10000] # both input sample should be same number 
ytrain = ytrain[:10000] # both input sample should be same number

y_out_a = keras.utils.to_categorical(ytrain, num_classes=10)
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2)
y_out_c = tf.square(tf.cast(ytrain, tf.float32))

print(xtrain.shape, xtest.shape) 
print(y_out_a.shape, y_out_b.shape, y_out_c.shape)
# (10000, 28, 28) (10000, 28, 28)
# (10000, 10) (10000, 2) (10000,)

Next, we need to modify some parts of the above model to take multi-input. And next if you now plot, you will see the new graph.

input0 = keras.Input(shape=(28, 28, 1), name="img2")
input1 = keras.Input(shape=(28, 28, 1), name="img1")
concate_input = layers.Concatenate()([input0, input1])

x = layers.Conv2D(16, 3, activation="relu")(concate_input)
...
...
...
# multi-input , multi-output
encoder = keras.Model( inputs = [input0, input1], 
                       outputs = [out_a, out_b, out_c], name="encoder")

enter image description here

Now, we can train the model as follows

# multi-input, multi-output
encoder.fit([xtrain, xtest], [y_out_a, y_out_b, y_out_c], 
             epochs=30, batch_size = 256, verbose=2)

Epoch 1/30
40/40 - 1s - loss: 66.9731 - 10cls_loss: 0.9619 - 2cls_loss: 0.4412 - 1rg_loss: 65.5699 - 10cls_accuracy: 0.7627 - 2cls_accuracy: 0.8815 - 1rg_mse: 65.5699
Epoch 2/30
40/40 - 0s - loss: 60.5408 - 10cls_loss: 0.8959 - 2cls_loss: 0.3850 - 1rg_loss: 59.2598 - 10cls_accuracy: 0.7794 - 2cls_accuracy: 0.8928 - 1rg_mse: 59.2598
Epoch 3/30
40/40 - 0s - loss: 57.3067 - 10cls_loss: 0.8586 - 2cls_loss: 0.3669 - 1rg_loss: 56.0813 - 10cls_accuracy: 0.7856 - 2cls_accuracy: 0.8951 - 1rg_mse: 56.0813
...
...
Epoch 28/30
40/40 - 0s - loss: 29.1198 - 10cls_loss: 0.4775 - 2cls_loss: 0.2573 - 1rg_loss: 28.3849 - 10cls_accuracy: 0.8616 - 2cls_accuracy: 0.9131 - 1rg_mse: 28.3849
Epoch 29/30
40/40 - 0s - loss: 27.5318 - 10cls_loss: 0.4696 - 2cls_loss: 0.2518 - 1rg_loss: 26.8104 - 10cls_accuracy: 0.8645 - 2cls_accuracy: 0.9142 - 1rg_mse: 26.8104
Epoch 30/30
40/40 - 0s - loss: 27.1581 - 10cls_loss: 0.4620 - 2cls_loss: 0.2446 - 1rg_loss: 26.4515 - 10cls_accuracy: 0.8664 - 2cls_accuracy: 0.9158 - 1rg_mse: 26.4515

Now, we can test the multi-input model and get multi-out from it.

pred10, pred2, pred1 = encoder.predict(
    [
         tf.expand_dims(xtrain[0], 0),
         tf.expand_dims(xtrain[0], 0)
    ]
)

# regression part 
pred1
array([[25.13295]], dtype=float32)

# even or odd 
pred2.argmax()
0

# what digit 
pred10.argmax()
5
Innat
  • 16,113
  • 6
  • 53
  • 101
  • Thank you for your reply. In my situation, the data (x, y) comes from different datasets. I should rather have something like this in the model.fit (..) part, I think: model.fit ({'input1': x_input1, 'input2': x_input2, 'input3': x_input3}, {'output1': y_output1, 'output2': y_output2, 'output3': y_output3}, validation_split = 0.2, epochs = 5, batch_size = 16) My big problem is how to mix the 3 inputs and introduce them in the same model. – Kibs J. Mar 29 '21 at 17:47
  • I updated my post adding my code to give you an idea of what I want to do. – Kibs J. Mar 29 '21 at 18:04
  • I use a pre-trained model (EfficienceNet) just after Concatenate()... and I get this error: ValueError: Cannot assign to variable conv2d_196/kernel:0 due to variable shape (3, 3, 6, 32) and value shape (32, 3, 3, 3) are incompatible – Kibs J. Mar 30 '21 at 01:23
  • In your case after `concate` two inputs, it shape gets `height` x `width` x 6. And then you pass this to the image net models that take the input shape of `height` x `width` x 3. To tackle this, just add this layer `Conv2D(3, (3, 3), padding='same')` right after the concate and before the image net model. – Innat Mar 30 '21 at 01:37
  • It will create feature maps of the same input dimension but with 3 channels which further will acceptable to the image net model. – Innat Mar 30 '21 at 01:39
  • I did what you suggested but I have a new error on the pre-trained weight: ValueError: You are trying to load a weight file containing 130 layers into a model with 131 layers. – Kibs J. Mar 30 '21 at 01:45
  • Please update your question with the last modified code. Just include the model definition part. – Innat Mar 30 '21 at 01:46
  • I updated my code but you can test it on this colab : https://colab.research.google.com/drive/1gjxgYjt3jWKZNSCJqRcyoDPDhTUYlGoj#scrollTo=VH446MEfqWb2 – Kibs J. Mar 30 '21 at 02:03
  • (1), you should use `layers.Conv2D(3, (3, 3), padding='same', activation="relu")` - padding same is important. (2), use this `model = keras.applications.EfficientNetB0(include_top=False, weights="imagenet")(inputs)`. You can't use `input_tensor` now. – Innat Mar 30 '21 at 02:16
  • Thanks. it worked :) but I have a new error. You can see it on the colab. It is on the line layers.GlobalAveragePooling2D... AttributeError: 'KerasTensor' object has no attribute 'output' – Kibs J. Mar 30 '21 at 02:37
  • My finetuning comes from the following example from keras: https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/ – Kibs J. Mar 30 '21 at 02:38
  • 1
    Try [this](https://colab.research.google.com/drive/1cnt_VDh3TRp4GR3ZKn2FLGTVlEbk2U3m?usp=sharing). The link you shared, used `layer.Input` that passes as `input_tensor` to the image net model. Burt in your case it comes from the `layer.concate`. – Innat Mar 30 '21 at 02:58
  • Thanks for your help. it worked well. The last difficulty I had was with the sample size. It had to be the same for all inputs. I fixed that in the preprocessing. – Kibs J. Mar 31 '21 at 03:41
  • If it solves your question, please mark it as the right answer. Thanks. – Innat Apr 02 '21 at 21:40