0

This is a long shot and more of a code designing sort of ask for a rookie like me but I think it has real value for real world applications

The core questions are:

  1. Can I save a trained ML model, such as Random Forest (RF), in R and call/use it later without the need to reload all the data used for training it?

  2. When, in real life, I have a massive folder of hundreds and thousands files of data to be tested, can I load that model I saved somewhere in R and ask it to go read the unknown files one by one (so I am not limited by RAM size) and perform regression/classification etc analysis for each of the file read in, and store ALL the output together into a file.

For example,

If I have 100,000 csv files of data in a folder, and I want to use 30% of them as training set, and the rest as test for a Random Forest (RF) classification.

I can select the files of interest, call them "control files". Then use fread() then randomly sample 50% of the data in those files, call the CARET library or RandomForest library, train my "model"

model <- train(,x,y,data,method="rf")

Now can I save the model somewhere? So I don't have to load all the control files each time I want to use the model?

Then I want to apply this model to all the remaining csv files in the folder, and I want it to read those csv files one by one when applying the model, instead of reading them all in, due to RAM issue.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
ML33M
  • 341
  • 2
  • 19
  • have you read this? https://stackoverflow.com/questions/14761496/saving-and-loading-a-model-in-r. I believe you can just do `saveRDS(model...)` – StupidWolf Jun 03 '21 at 10:28
  • @StupidWolf Now I have, thanks mate. This is great beginning. Now that I can save models to a file, is there a particular function that I should consider to read files one by one, and apply the model on the file read one by one? fread? – ML33M Jun 03 '21 at 19:59

0 Answers0