splitting multiple values in one column into multiple rows R

Question

I have a data frame which for the most part is one observation per row. However, some rows have multiple values:

# A tibble: 3 x 2
          `number`   abilities
             <dbl>       <chr>
1               51       b1261
2               57        d710
3               57 b1301; d550

structure(list(`number` = c(51, 57, 57), abilities = c("b1261", 
"d710", "b1301; d550")), .Names = c("number", "abilities"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

I'd like to get the following:

# A tibble: 3 x 2
          `number`   abilities
             <dbl>       <chr>
1               51       b1261
2               57        d710
3               57        d550
4               57       b1301

It's straight forward enough to split on the ; but I'm not sure how to easily add a new row, especially as abilities might contain more than 2 values.

This is very similar to: R semicolon delimited a column into rows but doesn't need to remove duplicates

Lamia · Accepted Answer · 2017-06-06T23:28:48.763

28

There's a function separate_rows in tidyr to do just that:

library(tidyr)
## The ";\\s+" means that the separator is a ";" followed by one or more spaces
separate_rows(df,abilities,sep=";\\s+")
  number abilities
   <dbl>     <chr>
1     51     b1261
2     57      d710
3     57     b1301
4     57      d550

edited Jun 06 '17 at 23:28

answered Jun 06 '17 at 23:12

Lamia

3,845
1
12
19

1

I think you need to trim whitespace as well, or use `sep=";\\s+"`, since otherwise the last entry will have a space at the beginning. – Marius Jun 06 '17 at 23:23
@Marius You're absolutely right, I hadn't spotted it. Thanks! – Lamia Jun 06 '17 at 23:25
Thanks for this, I didn't know of that feature in tidyr. I adjusted the space match to : ";\\s*" to allow for zero or more spaces – pluke Jun 07 '17 at 05:09

score 7 · Answer 2 · answered Jun 06 '17 at 23:12

7

dplyr is good for this as it has unnest:

library(tidyverse)
library(stringr)
df %>%
    mutate(unpacked = str_split(abilities, ";")) %>%
    unnest %>%
    mutate(abilities = str_trim(unpacked))

answered Jun 06 '17 at 23:12

Marius

58,213
16
107
105

1

Apparently, in recent `dplyr` versions it is mandatory to explicitely name the columns for `unnest`. So it becomes `unnest(cols = c(unpacked))`. – hannes101 May 13 '20 at 09:13

score 1 · Answer 3 · answered Jun 07 '17 at 04:07

1

Another option is cSplit

 library(splitstackshape)
 cSplit(df1, 'abilities', '; ', 'long')
 #   number abilities
 #1:     51     b1261
 #2:     57      d710
 #3:     57     b1301
 #4:     57      d550

answered Jun 07 '17 at 04:07

akrun

874,273
37
540
662

splitting multiple values in one column into multiple rows R

3 Answers3