I am trying to concatenate two large matrices of numbers, the first one: features
is an np.array
shaped 1238,72
, the other is being loaded from a .json
file as it can be seen on the second line below, it is shaped 1238, 768
. I need to load, concatenate, re-index, split into folds and save each fold in its own folder. The problem is that I get Killed
on the very first step (reading the .json
content into bert
)
with open(bert_dir+"/output4layers.json", "r+") as f:
bert = [json.loads(l)['features'][0]['layers'][0]['values'] for l in f.readlines()]
bert_post_data = np.concatenate((features,bert), axis=1)
del bert
bert_post_data = [bert_post_data[i] for i in index_shuf]
bert_folds = np.array_split(bert_post_data, num_folds)
for i in range(num_folds):
print("saving bert fold ",str(i), bert_folds[i].shape)
fold_dir = data_dir+"/folds/"+str(i)
save_p(fold_dir+"/bert", bert_folds[i])
Is there a way I can do this more memory efficiently? I mean, there's gotta be a better way... pandas, json lib?
Thanks for your time and attention