How can I erase duplicate data from my dataframe

Question

my code so far looks like this, I have been trying to eliminate the letters in a new and old vector that repeat themselves. the letters represent emails. I have tried using unique and distinct functions, but they keep one of the duplicate values when I need to erase them all. this is the vector I would like as a result

c(b,c,e,f,t,r,w,u,p,q)
new <- c("a","b","c","d","e","f","t")
old <- c("r","w","u","a","d","p","q")
num <- c(1:7)
df_new <- data.frame(num, new)
df_old <- data.frame(num, old)

df_new <- transmute(df_new, num, emails = new)
df_old <- transmute(df_old, num, emails = old)

all_emails <- merge(df_new, df_old, all = TRUE)

Suggested duplicate R-FAQ: [How can I remove all duplicates so that NONE are left in a data frame?](https://stackoverflow.com/q/13763216/903061) — Gregor Thomas, Sep 20 '18 at 14:29
In my vectors the letters that repeat themselves are "a" and "d", I want to eliminate them from both lists. I need to do this procedure for a large dataset tho. — Albatross, Sep 20 '18 at 14:31

Gregor Thomas · Accepted Answer · 2018-09-20T14:48:32.550

1

From what you show, you are complicating things unnecessarily by putting them in a data frame. Try this:

new <- c("a","b","c","d","e","f","t")
old <- c("r","w","u","a","d","p","q")
x = c(new, old)
result = x[!duplicated(x) & !duplicated(x, fromLast = TRUE)]
result
# [1] "b" "c" "e" "f" "t" "r" "w" "u" "p" "q"

Another method, if both your vectors are individually unique and you just need to drop everything that is in both new and old:

result = setdiff(union(new, old), intersect(new, old))

edited Sep 20 '18 at 14:48

answered Sep 20 '18 at 14:32

Gregor Thomas

136,190
20
167
294

If the variable is contain within a df, like this: {df$email <- df$email[!duplicated(df$email) & !duplicated(df$email, fromLast = TRUE)]} Why doesnt it work? – Albatross Sep 20 '18 at 15:35

How can I erase duplicate data from my dataframe

1 Answers1