0

I have a dataset that I need to split into a training and test set in R. It has many observations, and each have a value for their respective time (Q32008, Q42008,...,Q42016).

I want to split the dataset in half, using randomized quarters, where all observations for a specific quarter are together. For example, one dataset would have all observations from Q2 2009, Q4 2010, Q1 2008. I tried using split, but I could not unsplit it randomly, into 2 unique datasets.

Any ideas?

jacefarm
  • 6,747
  • 6
  • 36
  • 46
  • 2
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269) . This will make it much easier for others to help you. – Jaap Nov 15 '16 at 06:40

1 Answers1

0

Not sure if I understood what you meant. Is the code below helpful?

my.df = expand.grid(Quarter=paste0("Q",1:4),Year=2012:2016)
my.df$Period = with(my.df,paste0(Quarter,Year))
my.df$x = rnorm(nrow(my.df))

# Randomly select periods for different data frames
first.periods = sample(my.df$Period,nrow(my.df)/2)
my.df$SplitID = as.numeric(my.df$Period %in% first.periods)+1

# Split data frame
split.df = split(x = my.df,f = my.df$SplitID)
Gabriel Mota
  • 302
  • 1
  • 10