0

I have a data frame df1 that contains the rows of another data frame df2 (I only have one row in this question for a simpler visualisation of the problem). The data in df1 is organised in pairs according to Cat. What I need to do is use the rows from df2 to make a data frame with both rows of each pair. That is, the end goal is to have a data frame with only the pairs from df1 as shown in dfgoal.

df1 <- data.frame(Source = c("1", "1", "1", "1", "1", "1"),
Order = c("1", "2", "3", "4", "5", "6"),
Name = c("A", "A", "A", "A", "A", "A" ),
Cat = c("A1", "A1", "A2", "A2", "A3", "A3"),
Vol = c("35", "43", "none", "none", "63", "24"))

df2 <- data.frame(Source = c("1"),
Order = c("4"),
Name = c("A" ),
Cat = c("A2"),
Vol = c("none"))

dfgoal <- data.frame(Source = c("1", "1"),
Order = c("3", "4"),
Name = c("A", "A"),
Cat = c("A2", "A2"),
Vol = c("none", "none"))

My original data frames are much larger with more variables so I was hoping that there was a way to use the rows from df2 for filtering purposes in df1 so I don't need to specify column values that I can see from df2. I tried using the group_by function from the dplyr package in the hopes that the paired row (Order == "3" in this example) would stay grouped when joining the identical rows from df1 and df2. Needless to say it didn't work. I ended up with only the row from df2.

df_attempt <- df1 %>% group_by(Source, Name, Cat) #dplyr package

df_attempt <- inner_join(df_attempt, df2) #dplyr package
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Sorry, I'm not sure why the print function isn't working. – Joel Blomqvist Aug 08 '23 at 13:23
  • We don't need the `print(.)` calls or their output. – r2evans Aug 08 '23 at 13:23
  • 1
    using `inner_join` ignores `group_by`, that kind of operation does not have meaning; "groups" should be covered in the `by=` clause of `inner_join` (that you are ignoring). My guess is starting with `inner_join(df2, df1, by = c("Source", "Name", "Cat"))`, though (1) I'm inferring the need to join by source, name, and cat, and (2) there is a little clean up (or pre-conditioning) required to deal with the `Order.x`/`Order.y` (and `Vol.*`) columns. – r2evans Aug 08 '23 at 13:25

0 Answers0