How to put many numpy files in one big numpy file without having memory error?

Question

I follow this question Append multiple numpy files to one big numpy file in python in order to put many numpy files in one big file, the result is:

import matplotlib.pyplot as plt 
import numpy as np
import glob
import os, sys
fpath ="path_Of_my_final_Big_File"
npyfilespath ="path_of_my_numpy_files"   
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
all_arrays = np.zeros((166601,8000))
for i,npfile in enumerate(npfiles):
    all_arrays[i]=np.load(os.path.join(npyfilespath, npfile))
np.save(fpath, all_arrays)
data = np.load(fpath)
print data
print data.shape

I have thousands of files, by using this code, I have always a memory error, so I can't have my result file. How to resolve this error? How to read, write and append int the final numpy file by file, ?

When do you get the memory error? Your `np.zeros` line is too big for my system, creating a 10G array. If the array is too big to save, or to load back in, it might be too big to manipulate and plot. Why not save the data in chunks? Does it HAVE TO BE in one big big file/array? — hpaulj, Feb 22 '17 at 07:51
@hpaulj, I get the memory error, just after this line all_arrays = np.zeros((166601,8000)). — , Feb 22 '17 at 07:53
But of course I have, 166601 file, and that's a lot. and for each file I have 8000 points — , Feb 22 '17 at 07:54
@hpaulj, how to do that please? I would be very grateful if you could help me. In fact, when I executed my file yesterday , I spent one hour waiting for the appearance of the new file, but in vain , nothing appeared. — , Feb 22 '17 at 08:04
No hope to manage all your files data in your array. Compute your files one by one, and store only useful results. — B. M., Feb 22 '17 at 10:43

Teudimundo · Answer 1 · 2017-02-22T09:31:43.277

1

Try to have a look to np.memmap. You can instantiateall_arrays:

all_arrays = np.memmap("all_arrays.dat", dtype='float64', mode='w+', shape=(166601,8000))

from the documentation:

Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.

You will be able to access all the array, but the operating system will take care of loading the part that you actually need. Read carefully the documentation page and note that from the performance point of view you can decide whether the file should be stored column-wise or row-wise.

edited Feb 22 '17 at 09:31

answered Feb 22 '17 at 07:50

Teudimundo

2,610
20
28

where to add this line in my code? The first problem for me is the all_arrays = np.zeros((166601,8000)), – Feb 22 '17 at 09:25
substitute that line with the one in the answer. To be on the safe side use float64 instead of float32 (I'll update that in the answer) – Teudimundo Feb 22 '17 at 09:30
I try your solution, I am waiting for the appearance the final file, I try it for 100 file, it works , the size of the final file is 6.251 KB, that means for 166601 i need 10414,166 KB, (0,009931723 GB) or in my disk D I have 367 GB that is free. – Feb 22 '17 at 10:14
I don't know how the padding in the file will work, but being a `float64` you will need 8 byte for number, so `8* 166601* 8000=~10Gb` it looks like you should have enough space. If the solution works please accept the answer. Thanks – Teudimundo Feb 22 '17 at 14:13

How to put many numpy files in one big numpy file without having memory error?

1 Answers1

Linked