1

I am trying to concatenate rows of text by character in a data frame that looks something like this:

df <- data.frame(name = c("KYLE", "CARTMAN", "RANDY", "KYLE", "CARTMAN", "RANDY", "KYLE", "CARTMAN", "RANDY"),
                      lines  = c("Hello", "Hello", "Hello", "my name is", "my name is", "my name is", "Kyle", "Cartman", "Randy"))
df <- data.table(df)
df

##      name      lines
## 1    Kyle      Hello
## 2 Cartman      Hello
## 3   Randy      Hello
## 4    Kyle my name is
## 5 Cartman my name is
## 6   Randy my name is
## 7    Kyle       Kyle
## 8 Cartman    Cartman
## 9   Randy      Randy

And my desired data frame should look like this:

df
##      name      lines
## 1    Kyle      Hello my name is Kyle
## 2 Cartman      Hello my name is Cartman
## 3   Randy      Hello my name is Randy

After some research, I found a solution in Concatenate rows in a dataframe, but I can't figure out how to delete repeated rows:

df <- df[,  newlines := str_c(lines, collapse = " "), by = name]
df

##      name      lines
## 1    Kyle      Hello my name is Kyle
## 2 Cartman      Hello my name is Cartman
## 3   Randy      Hello my name is Randy
## 4    Kyle      Hello my name is Kyle
## 5 Cartman      Hello my name is Cartman
## 6   Randy      Hello my name is Randy
## 7    Kyle      Hello my name is Kyle
## 8 Cartman      Hello my name is Cartman
## 9   Randy      Hello my name is Randy

Perhaps there is some other way of concatenating rows so that I can avoid duplicates in the data frame?

Phil
  • 7,287
  • 3
  • 36
  • 66

1 Answers1

1

We need to summarise and not assign (:=) a column

library(data.table)
df[, .(lines = paste(lines, collapse=" ")), name]
akrun
  • 874,273
  • 37
  • 540
  • 662