3

My code is doing some math and saving output in multiple NumPy arrays.

At the end, I am writing the output to disk, for which I wish to use the name of the arrays as individual filenames, in which each array will be written.

For instance, if I have the following multidimensional arrays

time = [...]
force = [...]
pressure = [...]
energy = [...] 

etc, and I do

for array in [time, force, pressure, energy, ....]:
    with open(**filename**, 'w') as file:
         pickle.dump(array, file)

But how to set the filename, so that it takes on the array names.

I have gone through many similar questions (although asked for other motives). The answers have suggested that array(or any variable) names are merely tags and not made to be retrieved like this. But my motive for naming files here seems like a genuine need (to me at least), so asking. If that is possible, I can perhaps go fancier and write in HDF5 format and use the array names as different datasets. All this could be achieved manually though, but then why do we code?

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
nsk
  • 59
  • 1
  • 6
  • In which format you want to save the file ? Did you mean the array name should be file name and array elements is the file content ? – Vikas Periyadath Jan 30 '18 at 07:28
  • File format is immaterial. I am pickling right now, but could move onto others later. But yes, I need the array name to be the file name, and the array elements to go into the file as data. Thanks. – nsk Jan 30 '18 at 07:36

4 Answers4

1

You can use numpy.dtype.names. Here is an example.

# inputs
In [196]: A
Out[196]: 
array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34],
       [41, 42, 43, 44]])

In [197]: B
Out[197]: 
array([[1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4]])

# their dtype
In [198]: A.dtype, B.dtype
Out[198]: (dtype('int64'), dtype('int64'))

# their size
In [199]: A.size, B.size
Out[199]: (16, 16)

# store it as a list of tuples
In [200]: dt = np.dtype([('A', A.dtype, A.size), ('B', B.dtype, B.size)])

# get all arrays
In [201]: dt.names
Out[201]: ('A', 'B')


In [202]: dt['A']
Out[202]: dtype(('<i8', (16,)))

You can also skip the variable.size since it'll be inferred.

In [233]: dt = np.dtype([('A', A.dtype), ('B', B.dtype)])

# size inferred automatically
In [234]: dt.itemsize
Out[234]: 16

In [235]: dt.names
Out[235]: ('A', 'B')
kmario23
  • 57,311
  • 13
  • 161
  • 150
  • 1
    This seems like it can work. But don't you think typing the line `dt = np.dtype([('A', A.dtype, A.size), ('B', B.dtype, B.size)])` for all my arrays will take more space than if I just type their names manually into a list like `['time, 'force', pressure, 'energy', ...]` – nsk Jan 30 '18 at 08:03
  • @nsk I see your point :) Actually, you can ignore the sizes. It'll be inferred from the variable names.. But, I feel it's little cleaner than having all the names as a string in a list.. – kmario23 Jan 30 '18 at 08:10
  • Yeah, but if I skip the *.dtype and the np.dtype, I will be left with the list of strings. :P Thanks for the answer though. I learnt a new thing. – nsk Jan 30 '18 at 08:20
1

If I make a list from a set of variables, I cannot retrieve the names of those variables. I can only retrieve the objects referenced the variables.

In [324]: x = np.arange(3)
In [325]: y = np.ones((3,3))
In [326]: alist = [x,y]
In [327]: alist
Out[327]: 
[array([0, 1, 2]), array([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])]
In [328]: id(x)
Out[328]: 2851921416
In [329]: id(alist[0])
Out[329]: 2851921416

alist[0] does not in any way reference the variable name 'x'.

A dictionary is a better way of associating a name, or string, with an object:

In [331]: adict = {'x':x, 'y':y}
In [332]: adict['x']
Out[332]: array([0, 1, 2])

With such a dictionary, I can save these arrays with savez:

In [334]: np.savez('temp', **adict)
In [336]: d = np.load('temp.npz')
In [337]: list(d.keys())
Out[337]: ['y', 'x']

That npz archive contains two files named:

In [340]: !unzip -l temp.npz
Archive:  temp.npz
  Length      Date    Time    Name
---------  ---------- -----   ----
      200  2018-01-29 23:58   y.npy
      140  2018-01-29 23:58   x.npy
---------                     -------
      340                     2 files

That dictionary would also be useful when creating a HDF5 datasets.

Some examples of saving/loading variables (and dictionary) with pickle:

How to load/view structure of pickled object in Ipython console ? (Windows 7, Spyder, Ipython console)

Here's an attempt to save and load a workspace (or part of it) as commonly done with MATLAB:

IPython loading variables to workspace: can you think of a better solution than this?

IPython: how to automagically load npz file and assign values to variables?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This seems like something I can use. I will have to modify my main math program and write the numpy arrays into a dictionary instead, rather than individual arrays. Do you think this will have an (additional) affect on the memory usage and speed, because my data is big. These arrays are multi-dimensional and file sizes are touching 1.5GB easily already. – nsk Jan 30 '18 at 08:14
1

I would not do it at all.

Rather, I would do

time = [...]
force = [...]
pressure = [...]
energy = [...] 

file_data = {'time': time, 'force': force, 'pressure': pressure, 'energy': energy}
for filename, array in file_data.items():
    with open(filename, 'w') as file:
         pickle.dump(array, file)

That does not guarantee the correct order up to before 3.6 or so, but I don't think order matters in this case.

If order matters, I'd do

file_data = [('time', time), ('force', force), ('pressure', pressure), ('energy', energy)]
for filename, array in file_data:
    with open(filename, 'w') as file:
         pickle.dump(array, file)
glglgl
  • 89,107
  • 13
  • 149
  • 217
  • This is what @hpaulj suggested, and am thinking I would do. But there should be a straightforward way to retrieve the name as well, because when the _for loop_ is reading it, it is reading the arrays by name first and then looking for their values. – nsk Jan 30 '18 at 16:23
0

It is possible to get local variables using their name. Although is generally not the best idea. But if you need to:

Code:

locals()[var_name]

Test Code:

x = 1
y = 2
z = 3
for var_name in ('x', 'y', 'z'):
    print(locals()[var_name])

Results:

1
2
3

Local Example:

So to put this example into your example:

for array_name in ['time', 'force', 'pressure', 'energy', ....]:
    with open(array_name, 'w') as file:
        pickle.dump(locals()[array_name], file)
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • I did find mention of using locals() when I was searching for the answer. But looking for more elegant solution if possible. Anyhow, I would be looking to retrieve x, y, z in your test code, and not 1, 2, 3. A little tweak needed I guess? – nsk Jan 30 '18 at 07:42
  • May be I am missing something in your code. But I need the 'names' of the variables for my filenames, and not their 'values'. That's what the question is. You code gives same result as this would give.. for i in (x, y, z): print(i) – nsk Jan 30 '18 at 07:50
  • I cast the example into something closer to your code. – Stephen Rauch Jan 30 '18 at 07:56