Splitting one column into multiple columns based on a marker character

Question

I have a data frame with two columns as:

**+-----+-------+
|  V1 |  V2     |
+-----+---------+
|  1  | a,b,c   |
|  2  | a,c     |
|  3  | b,d     |
|  4  | e       |
|  .  | .       |
+-----+-------+**

I want to split the second column to multiple columns based on a marker character inside the second column. I want the output to be like following.

**+-----+-------------+
|  V1 |  V2 | V3 | V4 |
+-----+---------------+
|  1  |  a  | b  | c  |
|  2  |  a  | c  | NA |
|  3  |  b  | d  | NA |
|  4  |  e  | NA | NA |
|  .  |  .  | .  | .  |
+-----+-------------+**

This is a generalization of this question but with unbalanced sizes. For example the following example:

myVec <- c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13","F.US.CA6", "F.US", "F")

score 5 · Accepted Answer · answered Aug 11 '17 at 20:42

5

Or you can try this package splitstackshape

cSplit(dat, 2, drop = TRUE,sep=',')


   V1 V2_1 V2_2 V2_3
1:  1    a    b   c
2:  2    a    c   NA
3:  3    b    d   NA
4:  4    e    NA  NA

answered Aug 11 '17 at 20:42

BENY

317,841
20
164
234

I think this is the simplest. Thanks all for quick responses. – Rotail Aug 11 '17 at 20:50
@Rotail Glad it help , nice day – BENY Aug 11 '17 at 20:51

score 4 · Answer 2 · answered Aug 11 '17 at 20:42

You can use tidyr and its separate function

library(tidyr)
DF <- data.frame(V1 = 1:4, V2 = c("a,b,c", "a,c", "b,d","e"))
separate(DF, V2, into = c("V2", "V3", "V4"))
#> Warning: Too few values at 3 locations: 2, 3, 4
#>   V1 V2   V3   V4
#> 1  1  a    b    c
#> 2  2  a    c <NA>
#> 3  3  b    d <NA>
#> 4  4  e <NA> <NA>

If you want you can suppress the warning with fill argument

separate(DF, V2, into = c("V2", "V3", "V4"), fill = "right")
#>   V1 V2   V3   V4
#> 1  1  a    b    c
#> 2  2  a    c <NA>
#> 3  3  b    d <NA>
#> 4  4  e <NA> <NA>

score 3 · Answer 3 · answered Aug 11 '17 at 20:39

You can use data.table::tstrsplit:

library(data.table)
setDT(df)[, c(list(V1), tstrsplit(V2, ","))]

#   V1 V2 V3 V4
#1:  1  a  b  c
#2:  2  a  c NA
#3:  3  b  d NA
#4:  4  e NA NA

myVec
#[1] "F.US.CLE.V13" "F.US.CA6.U13" "F.US.CA6.U13" "F.US.CA6"     "F.US"        
#[6] "F"           

as.data.table(tstrsplit(myVec, '\\.'))

#   V1 V2  V3  V4
#1:  F US CLE V13
#2:  F US CA6 U13
#3:  F US CA6 U13
#4:  F US CA6  NA
#5:  F US  NA  NA
#6:  F NA  NA  NA

www · Answer 4 · 2017-08-11T21:31:40.747

You can use dplyr and tidyr. dt2 is the final output. Notice that separate from tidyr will work, but you need to know how many columns you are going to create beforehand. This solution does not need that.

library(dplyr)
library(tidyr)

# Example data frame
dt <- data_frame(V1 = 1:4,
             V2 = c("a,b,c", "a,c", "b,d", "e"))

# Process the data
dt2 <- dt %>%
  separate_rows(V2) %>%
  rename(Value = V2) %>%
  group_by(V1) %>%
  mutate(Col = paste0("V", 1:n() + 1)) %>%
  spread(Col, Value)

score 0 · Answer 5 · answered Aug 11 '17 at 20:50

If you are OK writing two lines of code, why not this approach: 1) split the strings according to the special character (comma) 2) find the max number of elements you exploded 3) prepare a table with the available elements, add NA when needed 4) wrap up and return your data frame.

df <- cbind(1:5, c("a", "a,b,v", "a,c", "d,f,f", "ddd"))
split.strings <- strsplit(df[,2], ",")
#
# get the max length
max.elems <- max(sapply(split.strings, length))
#
# wrap
new.data <- sapply(1:max.elems, (function(i){
  sapply(1:nrow(df), (function(rw){
    if (length(split.strings[[rw]]) >= i) {
      split.strings[[rw]][i]
    } else {
      NA
    }
  }))  
}))
#
# bind to identifier
final.df <- data.frame(id = df[,1], 
                       new.data,
                       stringsAsFactors = F)
final.df

Splitting one column into multiple columns based on a marker character

5 Answers5