I am using python 2.7 with svmlight to store and read a very big svmlight format file.
I am reading the file using
import sklearn
rows, labels = sklearn.datasets.load_svmlight_file(matrixPath, zero_based=True)
The file is too big to be stored in memory. I am looking for a way to iterate over the file in batches without the need to split the file in advance.
For now the best way i found is to split the svmlight file using terminal command split. and then reading the partial files i created.
I found that a good way to read big files is reading in batches of line by line in order not to overflow the memory.
How can i do this with svmlight formated files?
Thanks!