0

I am stuck with the following problem. I have two dataframes (df1, df2) I'd like to left_join(df1, df, by=c("a", "b", "c")) using three variables a, b and c. Then, I noticed, that the number of rows in the joined dataframe increased. Therefore I checked, whether there were any duplicate entries:

duplicated(paste(df1$a, df1$b, df1$c)
duplicated(paste(df2$a, df2$b, df2$c)

In both dataframes I found several duplicate entries. Now here comes my question: How can I exclude those duplicates before I join the two dataframes? My problem is that duplicated() only marks the duplicated values (i.e. the second appearance). I would like to exclude the first appearance too. I hope you get the point.

Thank you for help!

D. Studer
  • 1,711
  • 1
  • 16
  • 35
  • No, this won't work. – D. Studer Apr 13 '18 at 09:06
  • 3
    So this `df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]`? – Sotos Apr 13 '18 at 09:08
  • hutils::duplicated_values – Hugh Apr 13 '18 at 09:12
  • Possible duplicate of [Remove rows where all values of a column are identical, based on another column](https://stackoverflow.com/questions/49408420/remove-rows-where-all-values-of-a-column-are-identical-based-on-another-column) and https://stackoverflow.com/questions/38232998/how-to-remove-rows-that-have-only-1-combination-for-a-given-id?noredirect=1&lq=1 – jogo Apr 13 '18 at 09:14

1 Answers1

0

difficult to provide a specifc answer without example, but one way might be to create a count variable for each combination (>1 when duplicate) and then filter these observations.

library(tidyverse)
x <- LETTERS[c(1,1:5)]
y <- LETTERS[c(1,1:5)]
df <- data.frame(x,y)

df <- df %>% 
  group_by(x,y) %>% 
  mutate(freq=n()) %>% 
  filter(!freq>1)
zoowalk
  • 2,018
  • 20
  • 33