-1

After looking thru the available data.table doc & SO related question responses pertaining to dataframes, how do you efficiently generate a 70%,30% split out from a master data table (ie: 'foo') of separate 'foo.train' & 'foo.test' data tables with rows that are unique between them for predictive modeling purposes? (note, no use of caret or dplyr is allowed)

Need an additive solution to Gennaro Tedesco's code response:

https://stackoverflow.com/a/33201094/3741230

Thanks.

Community
  • 1
  • 1
user3741230
  • 307
  • 1
  • 2
  • 11

1 Answers1

2

Thanks All.

At the end of this response was the code inspiration needed: https://stackoverflow.com/a/32511327/3741230

> inTrain <- MyDT[,sample(.N, floor(.N*.75))]
> Train <- foo.dt[inTrain]
> Test <- foo.dt[-inTrain]

> dim(foo.dt)
[1] 100000      6
> dim(Train)
[1] 70000     6
> dim(Test)
[1] 30000     6

(Note that the first line avoids unnecessary repetition of the MyDT variable symbol and also passes a single number .N to sample() for efficiency rather than the unnecessary 1:.N long vector.)

Community
  • 1
  • 1
user3741230
  • 307
  • 1
  • 2
  • 11