I am stuck with the following problem. I have two dataframes (df1
, df2
) I'd like to left_join(df1, df, by=c("a", "b", "c"))
using three variables a
, b
and c
. Then, I noticed, that the number of rows in the joined dataframe increased. Therefore I checked, whether there were any duplicate entries:
duplicated(paste(df1$a, df1$b, df1$c)
duplicated(paste(df2$a, df2$b, df2$c)
In both dataframes I found several duplicate entries. Now here comes my question: How can I exclude those duplicates before I join the two dataframes? My problem is that duplicated()
only marks the duplicated values (i.e. the second appearance). I would like to exclude the first appearance too. I hope you get the point.
Thank you for help!