4

To simplify this problem I'll use a very basic subset of what the dataset might look like:

library(dplyr)
DF <- tibble(id = seq(1:4),
             label = c("A", "B", "C", "D"),
             val = c(NA, "5, 10", "20", "6, 7, 8"))
DF
# A tibble: 4 × 3
#      id label val    
#   <int> <chr> <chr>  
# 1     1 A     NA     
# 2     2 B     5, 10  
# 3     3 C     20     
# 4     4 D     6, 7, 8

Note: val just contains random numbers that have no meaning in this example, but consider them to be a sort of ID number if that helps).

In this example, I would like to:

  • identify rows containing commas in the val column (as these are the only rows to be changed)
  • duplicate each row n times such that the only values that change are in the val column and consist of a single numeric value (where n is the number of comma separated values)
  • e.g. 2 duplicate rows for row 2, and 3 duplicate rows for row 4

So far I've only worked out the filter step as below:

DF %>% filter(val %>% stringr::str_detect(pattern = "[,]") == TRUE)

But I'm unsure of how to proceed from here.

This is what my desired output is:

#      id label val    
#   <int> <chr> <chr>  
# 1     1 A     NA     
# 2     2 B     5
# 3     2 B     10
# 4     3 C     20     
# 5     4 D     6
# 6     4 D     7
# 7     4 D     8

Any help is appreciated, but I'd prefer to avoid any loops.

Thanks :)

kiwi
  • 565
  • 3
  • 11
  • 1
    I reckon this is https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows unless you have some additional complication to solve. – thelatemail Apr 12 '23 at 01:22

1 Answers1

2

As explained in the answers found from the link pasted in the comments, there are a few ways you can solve this.

The most efficient would probably be to do the following:

separate_rows(DF, val, sep = ", ")

You get:

# A tibble: 7 × 3
     id label val  
  <int> <chr> <chr>
1     1 A     NA   
2     2 B     5    
3     2 B     10   
4     3 C     20   
5     4 D     6    
6     4 D     7    
7     4 D     8    
bluesky
  • 153
  • 8