I have a large Numpy matrix act
with dtype=np.float32
and two vectors of the same length, raw_id
and raw_label
. I want to sort all 3 objects based on the values in raw_id
. However, I get a memory error when running this script. I've isolated it to act[sortind,:]
in the function below. How can I optimize the memory usage?
The arrray act
is roughly 1400000 x 400
, whereas raw_id
and raw_label
is 1400000 x 1
using dtype=np.float64
. It will almost fit into my 12gb of memory along with the remaining variables that I have initialised.
def sort_by_id(act, raw_id, raw_label):
sortind = np.argsort(raw_id)
return act[sortind,:], raw_id[sortind], raw_label[sortind]
# calling function with same variables
act, raw_id, raw_label = sort_by_id(act, raw_id, raw_label)