3

I have a data.table that has three columns. The second column, I'd like to split based on a regex, so I'd end up with four columns. I keep getting bizarre responses when I do this, and I'd love some feedback. Below is a look at the data:

     category                 label     count
1  Navigation     Product || Green         2 
2  Navigation      Survey || Green         5
3  Navigation       Product || Red        10
4  Navigation        Survey || Red        10

I'd want to split the label section at the || and create two new columns Type and Color.

M--
  • 25,431
  • 8
  • 61
  • 93
JAB
  • 115
  • 1
  • 1
  • 9

3 Answers3

4

Using data.table, you can do:

dt[, c("type", "color") := tstrsplit(label, " || ", fixed = TRUE)]

     category            label count    type color
1: Nagivation Product || Green     2 Product Green
2: Navigation  Survey || Green     5  Survey Green

Sample data:

dt <- data.table(category = c("Nagivation", "Navigation"),
                 label = c("Product || Green", "Survey || Green"),
                 count = c(2, 5))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
3

We can use tidyr::separate:

library(data.table)

dt1 <- fread("category     label            count
              Navigation   Product || Green     2
              Navigation   Survey || Green      5
              Navigation   Product || Red      10
              Navigation   Survey || Red       10")

tidyr::separate(dt1, label, sep = "\\|\\|", into = c("Type","Color"))

#>      category    Type   Color count
#> 1: Navigation Product   Green     2
#> 2: Navigation  Survey   Green     5
#> 3: Navigation Product     Red    10
#> 4: Navigation  Survey     Red    10
M--
  • 25,431
  • 8
  • 61
  • 93
  • 1
    This is the most straightforward and way to leverage. Marked this with the check because this worked super easily. – JAB Aug 13 '19 at 21:12
1
cbind(d, setNames(data.frame(do.call(rbind, strsplit(d$label, " || ", fixed = TRUE))),
         c("Type", "Color")))
#    category            label count     Type  Color
#1 Navigation Product || Green     2 Product   Green
#2 Navigation  Survey || Green     5  Survey   Green
#3 Navigation   Product || Red    10 Product     Red
#4 Navigation    Survey || Red    10  Survey     Red

Data

d = structure(list(category = c("Navigation", "Navigation", "Navigation", 
"Navigation"), label = c("Product || Green", "Survey || Green", 
"Product || Red", "Survey || Red"), count = c(2L, 5L, 10L, 10L
)), class = "data.frame", row.names = c(NA, -4L))
d.b
  • 32,245
  • 6
  • 36
  • 77