0

I ran some experiments using CreateDataPartition in R to split the train and test data. I looped the results over about 500 times and did it on one laptop. When I tried to replicate the experiments on another laptop with the same code and data, it gave me very different results and not close to what I had before. I assume that this could be due to the seed issue and I am trying to figure out how to manage this so that I can replicate at least almost the same results as before. Any suggestions?

Below is a snapshot of how I am splitting the train and test data:

for (i in 1:500){ 
  set.seed(i)
  index = createDataPartition(data$S, p=.75,list=FALSE,times=1)
  train = data[index,]
  test= data[-index,]
MugB
  • 65
  • 1
  • 9
  • If the code looks like what you posted then it should be the same. Are you sure there isn't some other code that you forgot about? – user2974951 Feb 12 '20 at 12:52
  • I am pretty sure I did not change anything else in the code itself. but I am using different workstation now. In the end I am calculating the average rmse. I repeated it several times in the initial workstation and gave me the exact same results but this time it is not. – MugB Feb 12 '20 at 12:56
  • There's a different between seeds from R 3.6.1 to R.3.5.1, see https://stackoverflow.com/questions/47199415/is-set-seed-consistent-over-different-versions-of-r-and-ubuntu/56381613#56381613 – StupidWolf Feb 12 '20 at 13:04
  • thanks for highlighting this but i was using version 3.6 on both – MugB Feb 12 '20 at 13:35

1 Answers1

0

Based on what i have understand you can try these things :

I had a similar issue of model results are different in two machines. Have you checked the parameters (if you are building the default model you might get different parameters in each machine)and version of which model you are building.

If you think there is a mistake in Train and Test split(which didn't happen to me create a dummy column with 1-n rows in two machines split them with same code and do the intersection of dummy column you will be sure train and test split working properly)

Ayrus
  • 36
  • 2
  • thanks. i cant recall the parameter values that were used in a different machine so I am not sure if this could have caused the different results but thanks for raising this point. this is probably one of the reasons for different results. – MugB Feb 20 '20 at 10:16