Hi I would like to split a delimiter into new rows: there are a few similar posts on stackoverflow, however I can't find one that addresses issues with eliminating duplicates. I've tried several ways.
df <- read.table(text= 'sample, GENE1
s1 A,B
s4 B,A,A,C,C'
, header = TRUE, stringsAsFactors = FALSE)
df %>%
mutate(GENE1b = unique(strsplit(as.character(GENE1), ",")) ) %>%
unnest(GENE1b)
code above will only produce
# A tibble: 7 x 3
sample. GENE1 GENE1b
<chr> <chr> <chr>
1 s1 A,B A
2 s1 A,B B
3 s4 B,A,A,C,C B
4 s4 B,A,A,C,C A
5 s4 B,A,A,C,C A
6 s4 B,A,A,C,C C
7 s4 B,A,A,C,C C
which is incorrect. since s4 should only contain new rows for B,A,C and no duplicates. Of course I can always remove the duplicates afterwards but I'm wondering if there is a way to do it in one go. I also tried converting it back with paste ( x, collapse=",") but this also failed.