0

Suppose I have the following data:

data = data.frame(
  name=c("bob", "bob", "mary", "mary", "mary"),
  colour=c("blue", "blue", "blue", "green", "green"),
  number=c(1,1,1,2,3))

data

  name colour number
1  bob   blue      1
2  bob   blue      1
3 mary   blue      1
4 mary  green      2
5 mary  green      3

How can I concatenate the above on two columns, removing any repeated strings? I have tried:

data <- data %>% group_by(`name`) %>%
  summarise_all(funs(paste(na.omit(.), collapse = ", ")))

But get the following which is incorrect:

  name             colour  number
1  bob         blue, blue    1, 1
2 mary blue, green, green 1, 2, 3

Expected output:

 name      colour number
1  bob        blue      1
2 mary blue, green  1,2,3
Simon
  • 991
  • 8
  • 30

1 Answers1

0

A data.table oneliner..

sample data

library(data.table)
DT <-fread("
  name colour number
            bob   blue      1
            bob   blue      1
           mary   blue      1
           mary  green      2
           mary  green      3")

code

cols <- c("colour", "number")
DT[, lapply(.SD, function(x) { paste0( unique(x), collapse = ",") }), 
   by = ,(name), .SDcols = cols][]

output

#    name     colour number
# 1:  bob       blue      1
# 2: mary blue,green  1,2,3
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • 1
    Yes but you overfit it in the example. What happens with 150 columns? It's better to use `...lapply(.SD, ...)` – Sotos May 17 '19 at 13:09
  • @Sotos. you are right I updated the answer – Wimpel May 17 '19 at 13:19
  • 1
    You don't need to specify columns If you do it for all the columns. Otherwise you did not do anything. You will still have to specify 150 names in your `cols` vector. In this case `DT[, lapply(.SD, fun), by = name]` would suffice – Sotos May 17 '19 at 13:27