0

I'm building some example data and need to be able to have some data for each combination of 3 variable factor.

I have 3 vectors each with 3 permutations:

fruit <- c("pears", "apples", "grapes")
veg <- c("carrots", "cabbages", "broccoli")
pets <- c("cats", "dogs", "fish")

I then have some dummy data:

Date_Range <- seq(as.Date("2017-01-01"), as.Date("2017-01-30"), by = 1),
Sessions <- ceiling(rnorm(90, mean = 3000, sd = 300))

I now want to build a data frame with these. For each of the 30 dates within Date_Range, I would like there to exist one of each of the distinct combinations of fruit, veg and pets.

How can I build by my in such a way?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • Can you provide an expected output example? – Icaro Bombonato Feb 15 '17 at 00:54
  • What is the point of the `Sessions` vector? You don't mention it at all. – Gregor Thomas Feb 15 '17 at 00:55
  • @Gregor just a metric to show alongiside each of these factor vars. The actual use case is website data and the actual vectors are landing page, device category (mobile, desktop, tablet) and channel (Google, Facebook, etc). – Doug Fir Feb 15 '17 at 00:55
  • Trying to understand... with 3 each fruits, vegs, and pets, there are 3*3*3 = 27 distinct combinations. You have 30 dates. So each combination for each date will have 27 * 30 = 810 total rows. And then you have 90 `Session` values that you want "alongside"? Just, like, randomly tacked on and repeated a bunch (maybe 810 / 90 = 9 times)? – Gregor Thomas Feb 15 '17 at 00:58
  • Ah of course @Gregor, I think you might have nailed something I over looked. So if I just change sessions to be 810 that might do it. Let me check – Doug Fir Feb 15 '17 at 00:59
  • 5
    See the expand.grid function: `expand.grid(Date_Range, fruit, veg, pets)` – Dave2e Feb 15 '17 at 00:59
  • I would suggest deleting `Sessions` from your question as it seems like an afterthought. It was the weird thing that kept me from immediately closing as a dupe. – Gregor Thomas Feb 15 '17 at 01:00
  • 2
    @Dave2e you'll want `Date_Range` in there as well. – Gregor Thomas Feb 15 '17 at 01:00
  • @Dave2e what a great useful function, thanks for sharing! I looked at the documentation hoping there'd be a arg to automatically call each variable the name of the vector instead of default "Var1", "Var2" etc but I guess not. Nevertheless this is very helpful, so thanks – Doug Fir Feb 15 '17 at 01:06
  • @DougFir, yes, that would be a nice feature. [The same methods in this question](http://stackoverflow.com/q/16951080/903061) on automatically naming lists could be used for `expand.grid`. – Gregor Thomas Feb 15 '17 at 01:13

1 Answers1

2

I think I got the 27 combinations. I've ignored the Sessions

d = t(combn(c(fruit,pets,veg),3))
x = rep(0, nrow(d))
for (i in 1:nrow(d)){
    if ( any(d[i,] %in% fruit) & any(d[i,] %in% pets) & any(d[i,] %in% veg) ){
    x[i] = 1
    }
}
d = d[x == 1,]

n = nrow(d) * length(Date_Range)
DATE = rep(Date_Range,nrow(d))

D = d[rep(seq_len(nrow(d)), each=NROW(Date_Range)),]
OUTPUT = cbind(D,DATE)
head(OUTPUT)
#                              DATE   
#[1,] "pears" "cats" "carrots" "17167"
#[2,] "pears" "cats" "carrots" "17168"
#[3,] "pears" "cats" "carrots" "17169"
#[4,] "pears" "cats" "carrots" "17170"
#[5,] "pears" "cats" "carrots" "17171"
#[6,] "pears" "cats" "carrots" "17172"
d.b
  • 32,245
  • 6
  • 36
  • 77