0

Example Data

Consider the two following data.frames as an example

set.seed(12)
d1 = cbind(
   expand.grid(1:4,LETTERS[1:3]),
   rnorm(12,10,1)
)[sample(1:12),]

d2 = as.data.frame(
       cbind(
          sample(c(1,2,4),900, replace=TRUE),
          sample(LETTERS[1:3],900,replace=TRUE)
       )[sample(1:900),]
     )

names(d1) = c("x","y","z")
names(d2) = c("x","y")

d1 is small and contain all possible combinations of x and y and contains the z variable. d2 is much longer, does not necessarily contain all possible combinations of x and y and does not contain the z variable. In both variables, there is no logical order for the combinations of x and y (as in the example).

Goal

How can I enter the values of the z variable from d1 into d2 respecting the combination of x and y?

Bad solution

d2$z = 0
for (x in unique(d1$x))
{
  for (y in unique(d1$y))
  {
    d2$z[d2$x == x & d2$y == y] = d1$z[d1$x == x & d1$y == y]
  }
}
head(d2)
  x y         z
1 2 B  9.727704
2 1 C  9.893536
3 2 A 11.577169
4 1 A  8.519432
5 4 C  8.706118
6 2 B  9.727704

However, it is terribly slow when the data.frames are a few millions rows long. Also, it is not very flexible.

Is there a better alternative?

Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • 1
    Hmm, isn't this just `merge(d1, d2, by = c("x", "y"))`? – David Arenburg Nov 11 '15 at 21:10
  • I started to answer this but David closed it. I would look into data.table package for big data (but smaller than RAM) – Dean MacGregor Nov 11 '15 at 21:13
  • Yes it appears to do the same thing (after reordering the rows). A much better solution...eventually the best? I will try on my big data.frames to see if the CPU time is bearable. Thank you – Remi.b Nov 11 '15 at 21:13
  • @DeanMacGregor there are `data.table` examples in the dupe. No need to post additional answers. – David Arenburg Nov 11 '15 at 21:16
  • 1
    @DavidArenburg I'm not saying you shouldn't have closed it but to the extent that OP just takes your `merge` comment without looking at other question, data.table would be good. – Dean MacGregor Nov 11 '15 at 21:20
  • I would love to see the data.table solutions indeed! I am using data.table and the `merge` function is still pretty slow. I am hoping that `data.table` could eventually make it a little faster. – Remi.b Nov 11 '15 at 21:22

0 Answers0