I have 30 million rows of data. Each contains an int array of size 512. Each int can have values from 0 to 50,500.
I'll need to retrieve about 100 rows simultaneously in one instance. I am wondering which data store will give the fastest retrieval for this.
It seems that the best datastores for this type of situation are hdf5 and numpy memmaps.
I am wondering if there is some sort of analysis or prediction for what may be faster for my situation.
But the situation is quite different from mine.