3

I have a large Numpy matrix act with dtype=np.float32 and two vectors of the same length, raw_id and raw_label. I want to sort all 3 objects based on the values in raw_id. However, I get a memory error when running this script. I've isolated it to act[sortind,:] in the function below. How can I optimize the memory usage?

The arrray act is roughly 1400000 x 400, whereas raw_id and raw_label is 1400000 x 1 using dtype=np.float64. It will almost fit into my 12gb of memory along with the remaining variables that I have initialised.

def sort_by_id(act, raw_id, raw_label):
    sortind = np.argsort(raw_id)
    return act[sortind,:], raw_id[sortind], raw_label[sortind]

# calling function with same variables
act, raw_id, raw_label = sort_by_id(act, raw_id, raw_label)
pir
  • 5,513
  • 12
  • 63
  • 101

0 Answers0