0

I would like to create a new data frame from two existing data frames, they share columns called first name, last name, and email, but I wish to merge them in a way the second data frame just sticks to the first one in order to create a list of all the emails I have. the data frames contain duplicates, so I wish to conserve them to proceed to eliminate them in the next step. Obviously, the code I posted below does not work. Any help?

first <- c("andrea","luis","mike","thomas")
last <- c("robinson", "trout", "rice","snell")
email <- c("andrea@gmail.com", "lt@gmail.com", "mr@gmail.com", "tom@gmail.com")



first <- c("mike","steven","mark","john", "martin")
last <- c("rice", "berry", "smalls","sale", "arnold")
email <- c("mr@gmail.com", "st@gmail.com", "ms@gmail.com", "js@gmail.com", "ma@gmail.com)
alz <- c(1,2,NA,3,4)
der <- c(0,2,3,NA,3)

all_emails <- data.frame(first,last,email)
no_contact_emails <- data.frame(first,last,email,alz,der)

df <- merge(no_contact_emails, all_emails, all = TRUE)

df <- df$email[!duplicated(df$email) & !duplicated(df$email, fromLast = TRUE)]

expected output will be a join dataset with all the emails except the one for mike rice since in the one that is duplicate.

kath
  • 7,624
  • 17
  • 32
Albatross
  • 19
  • 4
  • Please show a small reproducible example and expected output – akrun Sep 20 '18 at 16:46
  • Please share sample of your data using `dput()` (not `str` or `head` or picture/screenshot) so others can help. See more here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – Tung Sep 20 '18 at 16:53

1 Answers1

0

Your reproducible example is a little confusing, so I made you a new one to see if this is what you are looking for:

df1 <- data.frame(
    first = c("andrea","luis","mike","thomas"),
    last = c("robinson", "trout", "rice","snell"),
    email = c("andrea@gmail.com", "lt@gmail.com", "mr@gmail.com", "tom@gmail.com")
    )

df2 <- data.frame(
    first = c("mike","steven","mark","john", "martin"),
    last = c("rice", "berry", "smalls","sale", "arnold"),
    email = c("mr@gmail.com", "st@gmail.com", "ms@gmail.com", "js@gmail.com", 
    "ma@gmail.com")
    )

Now, there are 2 different ways you can do this, using dplyr:

library(dplyr)

df1 %>%
   bind_rows(df2) %>%
   distinct(first, last, .keep_all = TRUE)

Or:

df1 %>%
   full_join(df2)

Hope this helps!

Ramiro Bentes
  • 338
  • 1
  • 9