I have two data frames: df1
and df2
. They both have four columns; three with the same names ID
, Year
and Week
and one that are different from each other.
>df1
ID Year Oxygen Week
---- ------ ------- -------
1 2004 18 1
1 2005 17 1
2 2006 17 1
2 2007 18 1
3 2008 19 1
3 2010 20 1
3 2010 20 1
4 2012 16 1
5 2013 18 1
6 2014 18 1
>df2
ID Year Kg Week
---- ------ ----- -------
1 2004 20 1
1 2005 35 2
2 2006 30 2
3 2007 15 1
3 2008 70 2
4 2009 40 1
5 2013 55 1
6 2012 40 1
6 2014 10 2
7 2013 15 1
I would like to produce a new data frame which contains the rows from df1
only when the combination of ID
and Year
in df1
also are present in df2
. The Week
might be the same or not for that row, but I don't want to take the column Week
into account. So the first row in df1
has 1 for ID
and 2004 for Year
which also occurs in df2
. The combination of ID
and Year
for the second row in df1
does also occur in df2
but have different value for Week
.
I know how to do it if it only depends on one column:
df3 <- subset(df1, ID %in% df2$ID)
There was a solution for this when I didn't have the column Week
which was:
df3 <- df1 %>% inner_join(df2)
But I don't know how to make it depend on both the ID
and Year
at the same time without it also takes Week
into account.
I should end up with the following data frame, which only contain the columns from df1
:
>df3
ID Year Oxygen Week
---- ------ ------- -------
1 2004 18 1
1 2005 17 1
2 2006 17 1
3 2008 19 1
4 2012 16 1
5 2013 18 1
6 2014 18 1