2

I have a data frame with two columns as:

**+-----+-------+
|  V1 |  V2     |
+-----+---------+
|  1  | a,b,c   |
|  2  | a,c     |
|  3  | b,d     |
|  4  | e       |
|  .  | .       |
+-----+-------+**

I want to split the second column to multiple columns based on a marker character inside the second column. I want the output to be like following.

**+-----+-------------+
|  V1 |  V2 | V3 | V4 |
+-----+---------------+
|  1  |  a  | b  | c  |
|  2  |  a  | c  | NA |
|  3  |  b  | d  | NA |
|  4  |  e  | NA | NA |
|  .  |  .  | .  | .  |
+-----+-------------+**

This is a generalization of this question but with unbalanced sizes. For example the following example:

myVec <- c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13","F.US.CA6", "F.US", "F")
Rotail
  • 1,025
  • 4
  • 19
  • 40

5 Answers5

5

Or you can try this package splitstackshape

cSplit(dat, 2, drop = TRUE,sep=',')


   V1 V2_1 V2_2 V2_3
1:  1    a    b   c
2:  2    a    c   NA
3:  3    b    d   NA
4:  4    e    NA  NA
BENY
  • 317,841
  • 20
  • 164
  • 234
4

You can use tidyr and its separate function

library(tidyr)
DF <- data.frame(V1 = 1:4, V2 = c("a,b,c", "a,c", "b,d","e"))
separate(DF, V2, into = c("V2", "V3", "V4"))
#> Warning: Too few values at 3 locations: 2, 3, 4
#>   V1 V2   V3   V4
#> 1  1  a    b    c
#> 2  2  a    c <NA>
#> 3  3  b    d <NA>
#> 4  4  e <NA> <NA>

If you want you can suppress the warning with fill argument

separate(DF, V2, into = c("V2", "V3", "V4"), fill = "right")
#>   V1 V2   V3   V4
#> 1  1  a    b    c
#> 2  2  a    c <NA>
#> 3  3  b    d <NA>
#> 4  4  e <NA> <NA>
cderv
  • 6,272
  • 1
  • 21
  • 31
3

You can use data.table::tstrsplit:

library(data.table)
setDT(df)[, c(list(V1), tstrsplit(V2, ","))]

#   V1 V2 V3 V4
#1:  1  a  b  c
#2:  2  a  c NA
#3:  3  b  d NA
#4:  4  e NA NA

myVec
#[1] "F.US.CLE.V13" "F.US.CA6.U13" "F.US.CA6.U13" "F.US.CA6"     "F.US"        
#[6] "F"           

as.data.table(tstrsplit(myVec, '\\.'))

#   V1 V2  V3  V4
#1:  F US CLE V13
#2:  F US CA6 U13
#3:  F US CA6 U13
#4:  F US CA6  NA
#5:  F US  NA  NA
#6:  F NA  NA  NA
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

You can use dplyr and tidyr. dt2 is the final output. Notice that separate from tidyr will work, but you need to know how many columns you are going to create beforehand. This solution does not need that.

library(dplyr)
library(tidyr)

# Example data frame
dt <- data_frame(V1 = 1:4,
             V2 = c("a,b,c", "a,c", "b,d", "e"))

# Process the data
dt2 <- dt %>%
  separate_rows(V2) %>%
  rename(Value = V2) %>%
  group_by(V1) %>%
  mutate(Col = paste0("V", 1:n() + 1)) %>%
  spread(Col, Value)
www
  • 38,575
  • 12
  • 48
  • 84
0

If you are OK writing two lines of code, why not this approach: 1) split the strings according to the special character (comma) 2) find the max number of elements you exploded 3) prepare a table with the available elements, add NA when needed 4) wrap up and return your data frame.

df <- cbind(1:5, c("a", "a,b,v", "a,c", "d,f,f", "ddd"))
split.strings <- strsplit(df[,2], ",")
#
# get the max length
max.elems <- max(sapply(split.strings, length))
#
# wrap
new.data <- sapply(1:max.elems, (function(i){
  sapply(1:nrow(df), (function(rw){
    if (length(split.strings[[rw]]) >= i) {
      split.strings[[rw]][i]
    } else {
      NA
    }
  }))  
}))
#
# bind to identifier
final.df <- data.frame(id = df[,1], 
                       new.data,
                       stringsAsFactors = F)
final.df
Damiano Fantini
  • 1,925
  • 9
  • 11