I am using svm light files as a storage for sparse matrix.
A test shows that for a 31700108x54070 matrix with 570601944 entries
import xgboost as xgb
dtrain = xgb.DMatrix(train_file)
used 21seconds, way faster than
from sklearn.datasets import load_svmlight_file
x_train, y_train = load_svmlight_file(train_file)
used 7minutes.
Before hacking the code, anybody can help me answer this?
Do you have any suggestions to boost the load_svmlight_file function?
Thank you very much!