0

The previous question was closed, but the easy solution doesn't seem to work. So I further explained my question here.

I have two dataframes, df1 and df2. df1 has a lot of raw data, df2 has pointers based on "value_a" where to look in the raw data.

df1 <- data.frame("id" = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), "value_a" = c(0, 10, 21, 30, 43, 53, 69, 81, 93, 5, 16, 27, 33, 45, 61, 75, 90, 2, 11, 16, 24, 31, 40, 47, 60, 75, 88), "value_b" = c(100, 101, 100, 95, 90, 85, 88, 84, 75, 110, 105, 106, 104, 95, 98, 96, 89, 104, 104, 104, 103, 106, 103, 101, 99, 98, 97), "value_c" = c(0, -1, -2, -2, -2, -2, -1, -1, 0, 0, 0, 0, 1, 1, 2, 2, 1, -1, 0, 0, 1, 1, 2, 2, 1, 1, 0), "value_d" = c(1:27))
df2 <- data.frame("id" = c(1, 2, 3), "value_a" = c(53, 45, 47))

I would like to use the values in df2, to search in df1. So, for every "id", and it's unique "value_a" given in df2, find the corresponding "value_b" and "value_c" in df1, so I can generate a df3 which looks like this:

df3 <- data.frame("id" = c(1, 2, 3), "value_a" = c(53, 45, 47), "value_b" = c(85, 95, 101), "value_c" = c(-2, 1, 2))

Obviously, I have hundreds of "id"s to cover. Since I want to find multiple variables ("value_b", "value_c" but not "value_d") pull() won't work since it only pulls one variable. Based on this page I started thinking of joining. An innerjoin() wont work either, because I have to select on multiple variables (id & value_a). Merging like this

df3 <- merge(x = df1, y = df2, by.x = c(id, value_a), by.y = c(id, value_a)) %>%
  select(id, value_a, value_b, value_c)

is propably what describes what I'm thinking of, but this throws an error: Error in fix.by(by.x, x) : object 'value_a' not found

I was also thinking of using tapply() but I get stuck on using two different data.frames. Does someone have a good idea on how to tackle this?

Best regards,

Johan

Johan Vos
  • 65
  • 5

3 Answers3

1

I believe this can solve your issue. I hope this helps:

merge(df2,df1,by.x=c('id','value_a'),by.y=c('id','value_a'),all.x=T)

  id value_a value_b value_c value_d
1  1      53      85      -2       6
2  2      45      95       1      14
3  3      47     101       2      24
Duck
  • 39,058
  • 13
  • 42
  • 84
1

I believe this will be useful:

df2 %>% 
  inner_join(df1, by = c("id"="id", "value_a"="value_a"))

Output:

  id value_a value_b value_c value_d
1  1      53      85      -2       6
2  2      45      95       1      14
3  3      47     101       2      24
codez0mb1e
  • 706
  • 6
  • 17
0

Both answers of Duck and codez0mb1e work, when adding %>% select(id, value_a, value_b, value_c) they give the output

  id value_a value_b value_c
1  1      53      85      -2
2  2      45      95       1
3  3      47     101       2

Thanks for the feedback in my propable misinterpretation of both innerjoin() and merge!

Johan Vos
  • 65
  • 5