To simplify this problem I'll use a very basic subset of what the dataset might look like:
library(dplyr)
DF <- tibble(id = seq(1:4),
label = c("A", "B", "C", "D"),
val = c(NA, "5, 10", "20", "6, 7, 8"))
DF
# A tibble: 4 × 3
# id label val
# <int> <chr> <chr>
# 1 1 A NA
# 2 2 B 5, 10
# 3 3 C 20
# 4 4 D 6, 7, 8
Note: val
just contains random numbers that have no meaning in this example, but consider them to be a sort of ID number if that helps).
In this example, I would like to:
- identify rows containing commas in the
val
column (as these are the only rows to be changed) - duplicate each row n times such that the only values that change are in the
val
column and consist of a single numeric value (where n is the number of comma separated values) - e.g. 2 duplicate rows for row 2, and 3 duplicate rows for row 4
So far I've only worked out the filter step as below:
DF %>% filter(val %>% stringr::str_detect(pattern = "[,]") == TRUE)
But I'm unsure of how to proceed from here.
This is what my desired output is:
# id label val
# <int> <chr> <chr>
# 1 1 A NA
# 2 2 B 5
# 3 2 B 10
# 4 3 C 20
# 5 4 D 6
# 6 4 D 7
# 7 4 D 8
Any help is appreciated, but I'd prefer to avoid any loops.
Thanks :)