tf.data with multiple inputs / outputs in Keras

Question

For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple input data. Previously, I implemented my models successfully:

model.fit([pair_1, pair_2], labels, epochs=50)

I decided to replace my input pipeline with tf.data API. To this end, I create a Dataset similar to:

dataset = tf.data.Dataset.from_tensor_slices((pair_1, pair2, labels))

It compiles successfully but when start to train it throws the following exception:

AttributeError: 'tuple' object has no attribute 'ndim'

My Keras and Tensorflow version respectively are 2.1.6 and 1.11.0. I found a similar issue in Tensorflow repository: tf.keras multi-input models don't work when using tf.data.Dataset.

Does anyone know how to fix the issue?

Here is some main part of the code:

(q1_test, q2_test, label_test) = test
(q1_train, q2_train, label_train) = train

    def tfdata_generator(sent1, sent2, labels, is_training):
        '''Construct a data generator using tf.Dataset'''

        dataset = tf.data.Dataset.from_tensor_slices((sent1, sent2, labels))
        if is_training:
            dataset = dataset.shuffle(1000)  # depends on sample size

        dataset = dataset.repeat()
        dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

        return dataset

train_dataset = tfdata_generator(q1_train, q2_train, label_train, is_training=True, batch_size=_BATCH_SIZE)
test_dataset = tfdata_generator(q1_test, q2_test, label_test, is_training=False, batch_size=_BATCH_SIZE)


inps1 = keras.layers.Input(shape=(50,))
inps2 = keras.layers.Input(shape=(50,))

embed = keras.layers.Embedding(input_dim=nb_vocab, output_dim=300, weights=[embedding], trainable=False)
embed1 = embed(inps1)
embed2 = embed(inps2)

gru = keras.layers.CuDNNGRU(256)
gru1 = gru(embed1)
gru2 = gru(embed2)

concat = keras.layers.concatenate([gru1, gru2])

preds = keras.layers.Dense(1, 'sigmoid')(concat)

model = keras.models.Model(inputs=[inps1, inps2], outputs=preds)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

model.fit(
    train_dataset.make_one_shot_iterator(),
    steps_per_epoch=len(q1_train) // _BATCH_SIZE,
    epochs=50,
    validation_data=test_dataset.make_one_shot_iterator(),
    validation_steps=len(q1_test) // _BATCH_SIZE,
    verbose=1)

Maybe the error is related to nesting tuple inside another tuple? It does not recognize the inner tuple as a Tensor object? Can you try feeding it something like (pair1, pair2, labels) and then feed the pairs yourself to the fit to see if that works? — kvish, Oct 02 '18 at 02:32
I modified my example code, which should work now. Instead of tuples, you can pass a dictionary with the keys: "input_1" and "input_2" . — lhlmgr, Oct 10 '18 at 08:01
Try dataset = tf.data.Dataset.from_tensor_slices(((pair_1, pair2), labels)) — Kerem T, May 11 '20 at 19:14

lhlmgr · Accepted Answer · 2019-07-01T13:05:04.647

I'm not using Keras but I would go with an tf.data.Dataset.from_generator() - like:

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1, 1))
  sent2 = np.reshape(sent2, (8, 1, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8, 1))

  def generator():
    for s1, s2, l in zip(sent1, sent2, labels):
      yield {"input_1": s1, "input_2": s2}, l

  dataset = tf.data.Dataset.from_generator(generator, output_types=({"input_1": tf.int64, "input_2": tf.int64}, tf.int64))
  dataset = dataset.batch(2)
  return dataset

...

model.fit(_input_fn(), epochs=10, steps_per_epoch=4)

This generator can iterate over your e.g text-files / numpy arrays and yield on every call a example. In this example, I assume that the word of the sentences are already converted to the indices in the vocabulary.

Edit: Since OP asked, it should be also possible with Dataset.from_tensor_slices():

def _input_fn():
  sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
  sent1 = np.reshape(sent1, (8, 1))
  sent2 = np.reshape(sent2, (8, 1))

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
  labels = np.reshape(labels, (8))

  dataset = tf.data.Dataset.from_tensor_slices(({"input_1": sent1, "input_2": sent2}, labels))
  dataset = dataset.batch(2, drop_remainder=True)
  return dataset

Thanks for your response. my dataset is relatively small and I prefer to keep all of that in memory do you have any suggestion to fix the issue with ** from_tensor_slices** — Amir, Oct 05 '18 at 20:13
Hi Amir. 2 questions, and sorry if they are kind of .. stupid: One of the guys at the issue on github, mentioned: 'So the new features of feeding the iterator directly to model.fit() is valid only when you are using tf.Keras not the standalone Keras.' (he had the same error like you, and fixed it, by including the "correct" keras.) The other question is, you postet two times from_tensor_slices() one with a tuple and one with a triplet, which one is line you use? — lhlmgr, Oct 05 '18 at 21:32
I used tf.keras API. You are right, but in the both situation, tuple or triplet, not worked. — Amir, Oct 07 '18 at 14:50
Thanks for your response but something is not working for me on tensorflow 2.3.2. If it is not too much to ask, could you please update your answer to include the model so that it is copy/paste testing? thanks again — Pablo Jadzinsky, Jan 25 '21 at 17:04

pfm · Answer 2 · 2018-10-07T18:47:35.817

28

One way to solve your issue could be to use the zip dataset to combine your various inputs:

sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)
sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.float32)
sent1 = np.reshape(sent1, (8, 1, 1))
sent2 = np.reshape(sent2, (8, 1, 1))

labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.float32)
labels = np.reshape(labels, (8, 1))

dataset_12 = tf.data.Dataset.from_tensor_slices((sent_1, sent_2))
dataset_label = tf.data.Dataset.from_tensor_slices(labels)

dataset = tf.data.Dataset.zip((dataset_12, dataset_label)).batch(2).repeat()
model.fit(dataset, epochs=10, steps_per_epoch=4)

will print: Epoch 1/10 4/4 [==============================] - 2s 503ms/step...

edited Oct 07 '18 at 18:47

answered Oct 07 '18 at 12:29

pfm

6,210
4
39
44

Thank you @pfm. It sounds a good idea. I'll accept it if nobody gives another elegant way to solve the issue. – Amir Oct 07 '18 at 14:56
@pfm I have a similar issue, could you help me [here](https://stackoverflow.com/questions/63636427/how-to-load-images-by-their-paths-in-dataframe-columns-for-dual-input-using-data) – bit_scientist Sep 01 '20 at 04:36
can you please create the same without label? I am trying for the test data and I couldn't figure out how to do it. – tikendraw Nov 18 '22 at 00:58

tf.data with multiple inputs / outputs in Keras

2 Answers2

Linked