I try to implement an example of the xgboost external memory version in R. Please see paragraph 3 from this post:
https://www.r-bloggers.com/2017/01/parallel-computation-with-r-and-xgboost/
I have downloaded the datafile agaricus.txt.train from the link provided:
https://github.com/dmlc/xgboost/tree/master/demo/data
This runs fine after replacing the filename with a link.
dtrain = xgb.DMatrix('C:/test/agaricus.txt.train.txt#train.cache')
Next I would like to replace the data with my own data (from a dataframe).
Do I understand correctly that I need to convert my own dataframe to LIBSVM format? In that case I can try converters like these:
R - convert a data frame to a data set formatted as featureName:featureValue
PS: It would be optimal if I could convert the data below (which are not in LIBSVM format
into LIBSVM format
) as a test.
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test