3

I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'. Data table below to explain what it the source code looks like.

df <- data.table(
   b = c("a", "d/e/f", "g,h"),
     c = c("1", "2,3,4", "5/6")
   )

I've tried using separate_rows, but it can only split one column on one of these separators at a time.

EDIT: The data table I'm looking for looks approximately like this:

df_clean <- data.table(
  b = c("a", "d", "d", "d", 
        "e", "e", "e", "f", 
        "f", "f", "g", "g",
        "h", "h"),
  c = c("1", "2", "3", "4",
        "2", "3", "4",
        "2", "3", "4",
        "5", "6", 
        "5", "6")
)

4 Answers4

3

Updated answer based on added clarification.

Run separate_rows once on each column to get all of the permutations. You can use a regex pattern to specify multiple separators.

library(tidyr)

df %>%
  separate_rows(b, sep = '/|,') %>%
  separate_rows(c, sep = '/|,')

#> # A tibble: 14 × 2
#>    b     c    
#>    <chr> <chr>
#>  1 a     1    
#>  2 d     2    
#>  3 d     3    
#>  4 d     4    
#>  5 e     2    
#>  6 e     3    
#>  7 e     4    
#>  8 f     2    
#>  9 f     3    
#> 10 f     4    
#> 11 g     5    
#> 12 g     6    
#> 13 h     5    
#> 14 h     6
Aron Strandberg
  • 3,040
  • 9
  • 15
  • Just added some more clarification above: I should not have included a, that was just confusing. The goal is to have both columns split their observations that have "/" and "," into separate rows. I made a new data.table to demonstrate what it should look like in the original post. – user19536418 Nov 30 '22 at 12:01
2

Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]

for the first column:

s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))
IamTheB
  • 76
  • 5
1

An option with cSplit

library(splitstackshape)
cSplit(df, "b", sep = "/|,", "long", fixed = FALSE) |> 
   cSplit("c", sep = "/|,", "long", fixed = FALSE)

-output

    b c
 1: a 1
 2: d 2
 3: d 3
 4: d 4
 5: e 2
 6: e 3
 7: e 4
 8: f 2
 9: f 3
10: f 4
11: g 5
12: g 6
13: h 5
14: h 6
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A data.table options:

# option 1
foo = \(x) unlist(strsplit(x, ",|/"))
df[, do.call(CJ, lapply(.SD, foo)), .I][, !"I"]

Similarly in base R:

sep = ",|/"
Map(
  expand.grid,
  strsplit(df$b, sep),
  strsplit(df$c, sep)
) |> 
  do.call(rbind, args = _)

Result

#          b      c
#     <char> <char>
#  1:      a      1
#  2:      d      2
#  3:      d      3
#  4:      d      4
#  5:      e      2
#  6:      e      3
#  7:      e      4
#  8:      f      2
#  9:      f      3
# 10:      f      4
# 11:      g      5
# 12:      g      6
# 13:      h      5
# 14:      h      6
s_baldur
  • 29,441
  • 4
  • 36
  • 69