0

I have the following data:

ID<-c(001,002,003,003,004,005)
Email<-c("tom@abc.com","jane@abc.com","jim@abc.com","jim@abc.com","tom@abc.com","mike@abc.com")
df<-as.data.frame(cbind(ID,Email))

I want to create a table where the ID numbers for each person's email address will be shown in table format.

Email           IDs
jane@abc.com    002
jim@abc.com     003
mike@abc.com    005
tom@abc.com     001, 004

I've tried an apply function, tapply(df$ID,df$Email, FUN=length, but am only getting non-unique count.

jane@abc.com    1
jim@abc.com     2
mike@abc.com    1
tom@abc.com     2
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Danny
  • 554
  • 1
  • 6
  • 17

1 Answers1

3

With a data.table, this is simple:

df <- data.frame(
    id = c("001","002","003","003","004","005"),
    email = c("tom@abc.com","jane@abc.com","jim@abc.com","jim@abc.com","tom@abc.com","mike@abc.com"),
    stringsAsFactors = FALSE
)

library(data.table)
setDT(df)
df[ , .(idlist = paste(unique(id), collapse = ", ")), by = email]
DanY
  • 5,920
  • 1
  • 13
  • 33
  • 1
    or `res <- dt[, .(IDS = paste0(ID, collapse = ", ")), Email]` if you are happy with string values – Bulat Jul 26 '18 at 19:47
  • 1
    @Bulat - I will edit my answer to use this as it's more directly what the OP asked for. Thanks! – DanY Jul 26 '18 at 19:51
  • @DanY the solution you provided collapses the IDs into one cell. However, I only want to return unique IDs in each cell. In your solution you'll see that tom@abc.com has "003, 003" as the result. The 003 should be listed once. – Danny Jul 26 '18 at 20:25
  • Just wrap `id` with `unique()`. I've updated my answer to now include this. – DanY Jul 26 '18 at 21:15