Passing x_train as a list of numpy arrays to tf.data.Dataset is not working

Question

My problem is that x_train in tf.data.Dataset.from_tensor_slices(x_train, y_train) needs to be a list. When I use the following lines to pass [x1_train,x2_train] to tensorflow.data.Dataset.from_tensor_slices, then I get error (x1_train, x2_train and y_train are numpy arrays):

Train=tensorflow.data.Dataset.from_tensor_slices(([x1_train,x2_train], y_train)).batch(batch_size)

Error:

Train=tensorflow.data.Dataset.from_tensor_slices(([x1_train,x2_train], y_train)).batch(batch_size)
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Can't convert non-rectangular Python sequence to Tensor.

What should I do?

Actually I came across the exact same error today and my problem was that `x_train` in `tf.data.Dataset.from_tensor_slices(x_train, y_train)` needed to be a list, but with the information you provided I am not sure if this also applies to you. Could you provide some more information about your actual goal and what kind of data is included in `x1_train`, `x2_train` and `y_train`? — Kevin Südmersen, Jul 29 '20 at 16:31
Yes. I also have the same problem. x1_train, x2_train and y_train are numpy arrays and I need to pass [x1_train,x2_train] because my model has two set of inputs (I pass one of them through some layers and then aggregate with second set of inputs). How did you solve it? — Mehdi, Jul 29 '20 at 17:37

score 1 · Accepted Answer · answered Aug 01 '20 at 21:14

1

If the main goal is to feed data to a model having multiple input layers then the following might be helpful:

import tensorflow as tf
from tensorflow import keras
import numpy as np

def _input_fn(n):
  x1_train = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
  x2_train = np.array([15, 25, 35, 45, 55, 65, 75, 85], dtype=np.int64)

  labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)

  dataset = tf.data.Dataset.from_tensor_slices(({"input_1": x1_train, "input_2": x2_train}, labels))
  dataset = dataset.batch(2, drop_remainder=True)
  dataset = dataset.repeat(n)
  return dataset

input1 = keras.layers.Input(shape=(1,), name='input_1')
input2 = keras.layers.Input(shape=(1,), name='input_2')

model = keras.models.Model(inputs=[input_1, input_2], outputs=output)

basically instead of passing a python list, pass a dictionary where the key indicates the layer's name to which the array will be fed to.

like in the above array x1_train will be fed to tensor input1 whose name is input_1. Refered from here

answered Aug 01 '20 at 21:14

Pratik Kumar

2,211
1
17
41

Thanks for your answer. How can I use .prefetch,.catch and .make_one_shot_iterator in your code? (I want to train on multiple GPUs) – Mehdi Aug 03 '20 at 00:54
@hsn15051, yes you can use – Pratik Kumar Aug 03 '20 at 05:04
Could u please tell me in what order shall I use? (before dataset.repeat(n) or after dataset.repeat(n)) – Mehdi Aug 03 '20 at 05:14
@hsn15051, putting the whole code here for training on mult-gpus will be out of scope for this question. Please ask a separate question for this, and also mention your tensorflow version. Also refer this video presentation once https://www.youtube.com/watch?v=ZnukSLKEw34&list=PLSSIxfENhg_0AWjXuH_wbqv25nTS_Wcow&index=4&t=1274s – Pratik Kumar Aug 03 '20 at 05:35
1

I used tf.data with tfrecords. The sequence which I use for this is `list_files->shuffle->repeat->interleave->batch->map->cache->prefetch` – Pratik Kumar Aug 03 '20 at 05:42

score 0 · Answer 2 · answered Feb 24 '21 at 01:10

If you have a dataframe with different types (float32, int and str) you have to create it manually.

Following the Pratik's syntax:

tf.data.Dataset.from_tensor_slices(({"input_1": np.asarray(var_float).astype(np.float32), "imput_2": np.asarray(var_int).astype(np.int), ...}, labels))

Passing x_train as a list of numpy arrays to tf.data.Dataset is not working

2 Answers2