In two data.frames in R, find values in one data.frame, based on multiple variables in the other data.frame

Question

The previous question was closed, but the easy solution doesn't seem to work. So I further explained my question here.

I have two dataframes, df1 and df2. df1 has a lot of raw data, df2 has pointers based on "value_a" where to look in the raw data.

df1 <- data.frame("id" = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), "value_a" = c(0, 10, 21, 30, 43, 53, 69, 81, 93, 5, 16, 27, 33, 45, 61, 75, 90, 2, 11, 16, 24, 31, 40, 47, 60, 75, 88), "value_b" = c(100, 101, 100, 95, 90, 85, 88, 84, 75, 110, 105, 106, 104, 95, 98, 96, 89, 104, 104, 104, 103, 106, 103, 101, 99, 98, 97), "value_c" = c(0, -1, -2, -2, -2, -2, -1, -1, 0, 0, 0, 0, 1, 1, 2, 2, 1, -1, 0, 0, 1, 1, 2, 2, 1, 1, 0), "value_d" = c(1:27))
df2 <- data.frame("id" = c(1, 2, 3), "value_a" = c(53, 45, 47))

I would like to use the values in df2, to search in df1. So, for every "id", and it's unique "value_a" given in df2, find the corresponding "value_b" and "value_c" in df1, so I can generate a df3 which looks like this:

df3 <- data.frame("id" = c(1, 2, 3), "value_a" = c(53, 45, 47), "value_b" = c(85, 95, 101), "value_c" = c(-2, 1, 2))

Obviously, I have hundreds of "id"s to cover. Since I want to find multiple variables ("value_b", "value_c" but not "value_d") pull() won't work since it only pulls one variable. Based on this page I started thinking of joining. An innerjoin() wont work either, because I have to select on multiple variables (id & value_a). Merging like this

df3 <- merge(x = df1, y = df2, by.x = c(id, value_a), by.y = c(id, value_a)) %>%
  select(id, value_a, value_b, value_c)

is propably what describes what I'm thinking of, but this throws an error: Error in fix.by(by.x, x) : object 'value_a' not found

I was also thinking of using tapply() but I get stuck on using two different data.frames. Does someone have a good idea on how to tackle this?

Best regards,

Johan

score 1 · Answer 1 · answered Jul 08 '20 at 15:15

I believe this can solve your issue. I hope this helps:

merge(df2,df1,by.x=c('id','value_a'),by.y=c('id','value_a'),all.x=T)

  id value_a value_b value_c value_d
1  1      53      85      -2       6
2  2      45      95       1      14
3  3      47     101       2      24

score 1 · Accepted Answer · answered Jul 08 '20 at 15:34

I believe this will be useful:

df2 %>% 
  inner_join(df1, by = c("id"="id", "value_a"="value_a"))

Output:

  id value_a value_b value_c value_d
1  1      53      85      -2       6
2  2      45      95       1      14
3  3      47     101       2      24

score 0 · Answer 3 · answered Jul 08 '20 at 16:41

Both answers of Duck and codez0mb1e work, when adding %>% select(id, value_a, value_b, value_c) they give the output

  id value_a value_b value_c
1  1      53      85      -2
2  2      45      95       1
3  3      47     101       2

Thanks for the feedback in my propable misinterpretation of both innerjoin() and merge!

In two data.frames in R, find values in one data.frame, based on multiple variables in the other data.frame

3 Answers3