0

I am using a TensorFlow Dataset to consume data from my hard drive. The data is stored in NumPy arrays, and the paths for the NumPy arrays are stored in a text file. When creating the dataset, I am using the dataset.map() function to map each path to a NumPy array.

Here are the relevant parts of my code:

def parser(path):
    x = np.load(path)
    return x

paths = ['data1.npy', 'data2.npy', 'data3.npy', 'data4.npy', ... ]

dataset = tf.data.Dataset.from_tensor_slices((paths))
dataset = dataset.map(map_func=parser)

However, this gives the following error:

AttributeError: 'Tensor' object has no attribute 'read'

The error refers to the line x = np.load(path). So it seems that I cannot load a NumPy array in this way in my parser function, because path is not actually a string, but a Tensor.

What is the correct way to do this? I want to avoid using TFRecords if possible.


I have also tried wrapping the load function as follows:

x = tf.py_func(np.load(path))

But this gives me the same error on that line:

AttributeError: 'Tensor' object has no attribute 'read'

Karnivaurus
  • 22,823
  • 57
  • 147
  • 247
  • 1
    Possible duplicate of [Feeding .npy (numpy files) into tensorflow data pipeline](https://stackoverflow.com/questions/48889482/feeding-npy-numpy-files-into-tensorflow-data-pipeline) – jdehesa Aug 15 '18 at 10:02

1 Answers1

0

You get this error because np.load requires a string as input, but get Tensor. You can use tf.py_func to wrap load function.

Kaihong Zhang
  • 419
  • 3
  • 9
  • Thanks for the suggestion. I have tried tried this, however, I still get the same error message (see above). – Karnivaurus Aug 15 '18 at 11:29
  • use `tf.py_func` like this: `x = tf.py_func(np.load, path, tf.string)`. You can see the link @jdehesa provide, there is a very good example. – Kaihong Zhang Aug 16 '18 at 01:31