0

I have two data frames: df1 and df2. They both have four columns; three with the same names ID, Year and Week and one that are different from each other.

>df1
ID      Year      Oxygen     Week
----    ------    -------    -------
1       2004      18          1
1       2005      17          1 
2       2006      17          1
2       2007      18          1 
3       2008      19          1
3       2010      20          1
3       2010      20          1
4       2012      16          1
5       2013      18          1
6       2014      18          1

>df2
ID      Year      Kg       Week
----    ------   -----    -------
1       2004      20        1
1       2005      35        2
2       2006      30        2
3       2007      15        1
3       2008      70        2
4       2009      40        1
5       2013      55        1
6       2012      40        1
6       2014      10        2
7       2013      15        1

I would like to produce a new data frame which contains the rows from df1 only when the combination of ID and Year in df1 also are present in df2. The Week might be the same or not for that row, but I don't want to take the column Week into account. So the first row in df1 has 1 for ID and 2004 for Year which also occurs in df2. The combination of ID and Year for the second row in df1 does also occur in df2but have different value for Week.

I know how to do it if it only depends on one column: df3 <- subset(df1, ID %in% df2$ID)

There was a solution for this when I didn't have the column Week which was: df3 <- df1 %>% inner_join(df2)

But I don't know how to make it depend on both the ID and Year at the same time without it also takes Week into account.

I should end up with the following data frame, which only contain the columns from df1:

>df3
ID      Year      Oxygen      Week
----    ------    -------    -------
1       2004      18          1
1       2005      17          1
2       2006      17          1
3       2008      19          1
4       2012      16          1
5       2013      18          1
6       2014      18          1

1 Answers1

0

tidyverse approach

library(dplyr)

df3 <-
  df1 %>% 
  inner_join(df2)
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
  • I have just added another column which replicates my data in a better way. I had just simplified my data when writing this question and can now see it makes the solution with inner_join not possible. – Erik Christensen Dec 17 '22 at 09:10
  • And with this approach I end up with a data frame that merges the two data frames. I need to create a new data frame where it only consists of the data in df1. – Erik Christensen Dec 17 '22 at 15:48