0

I am trying to merge 34 matrices each sized 256 x 6000000 and of type numpy.float32 into a single matrix and store it on my system. Each matrix is stored in a separate .npy file.

This is the script I am using:

import numpy as np
import os

# combined matrix variable
amp_data = []
count=0
for filename in os.listdir(os.getcwd()):
    if filename.endswith('.npy'):
    if count==0:
        amp_data = np.load(filename, mmap_mode='r')   
    else:
        amp_ = np.load(filename, mmap_mode='r')
        amp_data = np.hstack((amp_data, amp_))
        del amp_
    count = count+1

My system runs into memory error obviously (RAM: 64Gb). Is there way for me to combine these matrices into one and save them?

PrakashG
  • 1,642
  • 5
  • 20
  • 30
deathracer
  • 305
  • 1
  • 20
  • If the arrays are too large to combine and save, then they are also too large to use in any meaningful way, right? – hpaulj Jul 03 '19 at 05:53
  • This doesn't get around the ultimate memory error, but the recommended way of combining arrays is to collect them in a list (`amp_data.append(amp_)`, and then using concatenate (or one of the `stack`) to create one file in one step. It's more efficient then repeated `hstack`. – hpaulj Jul 03 '19 at 05:55
  • the reason I am trying to merge this into a single file is that another analysis software that I am currently using can load large files without running into memory overflow error. – deathracer Jul 03 '19 at 05:57
  • Would the other system be happy working with `HDF5` files? I have used it much, but with `h5py` you can define a file and dataset that can grow along one dimension. That is mentioned in one of the linked answers. – hpaulj Jul 03 '19 at 07:02

1 Answers1

1

Yes. The npy format is documented at NEP 1 — A Simple File Format for NumPy Arrays. It is intented to be easily reverse engineered or directly processed by other programs.

So you should be able to read on file at a time and write it directly into a larger npy file.


References:

Requirements
The format MUST be able to:
...
A competent developer should be able to create a solution in his preferred programming language to read most NPY files...

You will find additional examples and references in What is the way data is stored in *.npy?

You could even find reusable code in numpy/format.py.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252