1

I want to create a data generator in Python, based on the "fit_generator" function at https://keras.io/models/sequential/. The code for the function is:

def generate_arrays_from_file():
    while True:
        with open('data.npz') as f:
            for line in f:
                # TODO
                yield ({'input': x}, {'output': y})

In the line TODO, I need to assign some data from f to x and y.

Now, the file 'data.npz' is actually a zipped NumPy file. This was created by:

x = random_numpy_array()  # Create a NumPy array (details not important)
y = random_numpy_array()
np.savez('data.npz', x=x, y=y)

Usually, you would read x and y by using:

data = np.load('data.npz')
x = data['x']
y = data['y']

However, in my example (the first block of code above), I have not loaded the data using np.load(). Instead, I have loaded it using with open('data.npz') as f.

To read x and y from f, I have tried:

x = f['x']
y = f['y']

But this gives me the error:

TypeError: '_io.TextIOWrapper' object is not subscriptable

So how can I read f and extract x and y?

Karnivaurus
  • 22,823
  • 57
  • 147
  • 247
  • Is there any specific reason why you cannot read with `np.load()`? Also, `f` is only the _file handler_, not the _data_. You would have to use something like `data = f.readlines()`, but due to the compressed nature of the npz file that will also only result in gibberish. – dennlinger Jun 22 '18 at 12:52
  • What is the purpose of while when you are already iterating through the file with for loop ? – BcK Jun 22 '18 at 12:56
  • A .npy cannot be read `by line`. It isn't a text file. A `npz` is a archive of npy. It makes even less sense to try `by line` of an archive, – hpaulj Jun 22 '18 at 14:13

1 Answers1

1

I'm pretty sure you need to use the load function for numpy files. It'll load it into an array with which you can then store the arrays in a dictionary if thats what you want.

see Loading arrays from npz files in pythhon

Zx4161
  • 180
  • 8