0

I have this dataset:

data <- data.frame(column = c("apple, banana, cherry",
"apple, banana, cherry, grape", 
"apple, banana, cherry, grape, pear")) 

data

  column
1 apple, banana, cherry
2 apple, banana, cherry, grape
3 apple, banana, cherry, grape, pear

I'd like my output to be:

  column1  column2  column3  column4  column5
1 apple    banana   cherry   NA       NA
2 apple    banana   cherry   grape    NA
3 apple    banana   cherry   grape    pear

I tried: strsplit(data$column, ","), but this returns a list and I struggle to get it back into a dataframe type because the rows are of unequal length and this won't work: as.data.frame(strsplit(data$column, ",")) .

Thanks a lot!

MetaPhilosopher
  • 131
  • 2
  • 9
  • Does this answer your question? [Split data frame string column into multiple columns](https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns) – user438383 Aug 08 '22 at 21:03
  • I think it doesn't because the one you listed does not have an issue with unequal items in rows. The answer below by M.Viking is really good. – MetaPhilosopher Aug 08 '22 at 21:18
  • 1
    Hadleys second answer answers this, uneven rows doesn't matter since it will automatically pad them with NA. this is a a duplicate that's been asked about 100 times. – user438383 Aug 08 '22 at 21:23
  • Fair enough, the 2nd answer indeed achieves this. – MetaPhilosopher Aug 09 '22 at 07:25

1 Answers1

2
library(tidyr)
separate(data=df, col=column, sep=", ", into=paste0("column_", 1:5))

  column_1 column_2 column_3 column_4 column_5
1    apple   banana   cherry     <NA>     <NA>
2    apple   banana   cherry    grape     <NA>
3    apple   banana   cherry    grape     pear

You could automate the counting of the number of prospective columns with

library(stringr)
max(str_count(df$column, ","))+1 # returns 5
M.Viking
  • 5,067
  • 4
  • 17
  • 33