I have a large h5 file (50gb). I need to extract a square submatrix from the file. So far my code is:
import h5py
import random
file = h5py.File('numDistances.h5', 'r')
data = file['DS1'] # 120,000 x 120,000 matrix
randomRows = random.sample(range(110000), 40000)
randomRows.sort()
# Get the rows first and then the corresponding columns:
rows = data[randomRows, :]
output = rows[:,randomRows]
Unfortunately pulling the data out like this is very slow. Do you know any slicing techniques/additional libraries that could help me make this much faster, thanks.