0

I have a dataset with columns: id, names. There can be one id but multiple names, so I am getting duplicate id-rows at times:

id   names

id1 name1 
id1 name2 
id1 name3
id2 name4
id2 name5 

I need to restructure such a data.frame in R, so that all rows would have unique ids, and if there are multiple names, they all should be written into the names column as comma separated values like that:

id   names
id1  name1, name2, name3
id2  name4, name5

I tried grouped <- table %>% group_by(names) but it did not work.

How could I achieve that in R?

Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87

1 Answers1

3

Using data.table:

 df <- read.table(header=T, text="id   names

                             id1 name1 
                             id1 name2 
                             id1 name3
                             id2 name4
                             id2 name5")

    library(data.table)
    setDT(df)
    df[, names := as.character(names)]
    df[, names := paste0(names, collapse = ", "), by = id]
    df <- unique(df)

Output:

df
    id               names
1: id1 name1, name2, name3
2: id2        name4, name5
sm925
  • 2,648
  • 1
  • 16
  • 28