5

I am trying to put many numpy files to get one big numpy file, I tried to follow those two links Append multiple numpy files to one big numpy file in python and Python append multiple files in given order to one big file this is what I did:

import matplotlib.pyplot as plt 
import numpy as np
import glob
import os, sys
fpath ="/home/user/Desktop/OutFileTraces.npy"
npyfilespath ="/home/user/Desktop/test"   
os.chdir(npyfilespath)
with open(fpath,'wb') as f_handle:
    for npfile in glob.glob("*.npy"):
        # Find the path of the file
        filepath = os.path.join(npyfilespath, npfile)
        print filepath
        # Load file
        dataArray= np.load(filepath)
        print dataArray
        np.save(f_handle,dataArray)
        dataArray= np.load(fpath)
        print dataArray

An example of the result that I have:

/home/user/Desktop/Trace=96
[[ 0.01518007  0.01499514  0.01479736 ..., -0.00392216 -0.0039761
  -0.00402747]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=97
[[ 0.00614908  0.00581004  0.00549154 ..., -0.00814741 -0.00813457
  -0.00809347]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=98
[[-0.00291786 -0.00309509 -0.00329287 ..., -0.00809861 -0.00797789
  -0.00784175]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]]
/home/user/Desktop/Trace=99
[[-0.00379887 -0.00410453 -0.00438963 ..., -0.03497837 -0.0353842
  -0.03575151]]
[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
  -0.00762086]

this line represents the first trace:

[[-0.00824758 -0.0081808  -0.00811402 ..., -0.0077236  -0.00765425
      -0.00762086]]

It is repeated all the time.

I asked the second question two days ago, at first I think that I had the best answer, but after trying to model to print and lot the final file 'OutFileTraces.npy' I found that my code:

1/ doesn't print numpy files from folder 'test' with respecting their order(trace0,trace1, trace2,...)

2/ saves only the last trace in the file, I mean by that when print or plot the OutFileTraces.npy, I found just one trace , it is the first one.

So I need to correct my code because really I am blocked. I would be very grateful if you could help me.

Thanks in advance.

Community
  • 1
  • 1
nass9801
  • 339
  • 3
  • 6
  • 14
  • @http://stackoverflow.com/users/6626530/shijo, this is my code. – nass9801 Feb 13 '17 at 14:00
  • In a link cited in your 1st link, I explore reading a file with multiple `save`, http://stackoverflow.com/a/35752728/901925 – hpaulj Feb 13 '17 at 15:58
  • @hpaulj, in fact I am able to read all my data by using my code, the problem is just when saving in the file, it saves only the first file, this is my updated code: http://stackoverflow.com/questions/42204368/how-to-append-many-numpy-files-into-one-numpy-file-in-python – nass9801 Feb 13 '17 at 16:05
  • Why is the last `load` indented? – hpaulj Feb 13 '17 at 16:28
  • @hpaulj, could you please take a look at the modified code? – nass9801 Feb 13 '17 at 16:43
  • So now you indent it even more! What's the purpose of that load? To see everything in the file, the last thing in the file, or the first thing in the file? – hpaulj Feb 13 '17 at 16:45
  • My goal is to see all the data in the file and plot it. I need to be sure that this file gives me the same plot that different files give me when I plot all of them in the same file. – nass9801 Feb 13 '17 at 16:53

2 Answers2

4
  1. Glob produces unordered lists. You need to sort explicitly with an extra line as the sorting procedure is in-place and does not return the list.

    npfiles = glob.glob("*.npy")
    npfiles.sort()
    for npfile in npfiles:
        ...
    
  2. NumPy files contain a single array. If you want to store several arrays in a single file you may have a look at .npz files with np.savez https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez I have not seen this in use widely, so you may wish seriously to consider alternatives.

    1. If your arrays are all of the same shape and store related data, you can make a larger array. Say that the current shape is (N_1, N_2) and that you have N_0 such arrays. A loop with

      all_arrays = []
      for npfile in npfiles:
          all_arrays.append(np.load(os.path.join(npyfilespath, npfile)))
      all_arrays = np.array(all_arrays)
      np.save(f_handle, all_array)
      

      will produce a file with a single array of shape (N_0, N_1, N_2)

    2. If you need per-name access to the arrays, HDF5 files are a good match. See http://www.h5py.org/ (a full intro is too much for a SO reply, see the quick start guide http://docs.h5py.org/en/latest/quick.html)
Pierre de Buyl
  • 7,074
  • 2
  • 16
  • 22
2

As discussed in

loading arrays saved using numpy.save in append mode

it is possible to save multiple times to an open file, and it possible to load multiple times. That's not documented, and probably not preferred, but it works. savez archive is the preferred method for saving multiple arrays.

Here's a toy example:

In [777]: with open('multisave.npy','wb') as f:
     ...:     arr = np.arange(10)
     ...:     np.save(f, arr)
     ...:     arr = np.arange(20)
     ...:     np.save(f, arr)
     ...:     arr = np.ones((3,4))
     ...:     np.save(f, arr)
     ...:     
In [778]: ll multisave.npy
-rw-rw-r-- 1 paul 456 Feb 13 08:38 multisave.npy
In [779]: with open('multisave.npy','rb') as f:
     ...:     arr = np.load(f)
     ...:     print(arr)
     ...:     print(np.load(f))
     ...:     print(np.load(f))
     ...:     
[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

Here's a simple example of saving a list of arrays of the same shape

In [780]: traces = [np.arange(10),np.arange(10,20),np.arange(100,110)]
In [781]: traces
Out[781]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]
In [782]: arr = np.array(traces)
In [783]: arr
Out[783]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])

In [785]: np.save('mult1.npy', arr)

In [786]: data = np.load('mult1.npy')
In [787]: data
Out[787]: 
array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])
In [788]: list(data)
Out[788]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • thank you very much for your answer, but i have one million of traces, and this solution is not practical, do you agree with me?? – nass9801 Feb 13 '17 at 16:55
  • Why not? I could have written that example with a couple of loops, saving from a list of files or arrays, and loading into a list. As a side question - are these 'traces' all the same size? If so they could be concatenated into one large array, allowing you to save/load with just one call. – hpaulj Feb 13 '17 at 17:08
  • @hpauli, they have all the same size which it is about 32,1 kB – nass9801 Feb 13 '17 at 17:10
  • I added an example of saving a list of arrays that are all the same shape. – hpaulj Feb 13 '17 at 17:15
  • Thank you @hpaulj, I will modify my code and tell you the result. – nass9801 Feb 13 '17 at 17:20
  • Dear @hpaulj, you idea worked, I succeed to put all the traces in one file on array, but I can't plot this file. I put the problem in another question, because I can't put two question in the same post: http://stackoverflow.com/questions/42227997/how-to-plot-an-array-in-python , could help me please? – nass9801 Feb 14 '17 at 14:34