I'm working with a very large set of data, about 120,000 rows and 34 columns. As you can well image, when using the R package randomForest, the program takes quite a number of hours to run, even on a powerful Windows server.
Although I am no expert in randomForest, I have a question about the proper use of the combine() function.
I seem to get conflicting answers when I researched this question online. Some say that you can only use combine() when using randomForest on the same set of data. Others say that you can just use combine().
What I'd like (hope, dream) to do is break up the 120,000 rows of data into 6 data frames, each containing 20,000 rows and perform randomForest on each of the 6 data frames. My hope is that I can use the combine() function to then combine the results of all 6 together. Is that possible?
Any help in this matter would be greatly appreciated.