Here is my dataframe
`mutations Pos Dataset percentage_occurance newCol
<chr> <dbl> <chr> <dbl> <chr>
1 P323L 323 jan_jun_2021 99.2 P323L, D614G, P323L, D614G
2 D614G 614 jan_jun_2021 99.9 P323L, D614G, P323L, D614G
3 D279N 279 jan_jun_2021 6.30 D279N, S194L, N440K, R52I, S235F, L126F,
E261stop, S97I, S2P,…
I basically want to remove the diuplicates inside the column newCol and get a dataframe with the same columns. So the output should look like this:
mutations Pos Dataset percentage_occurance newCol
<chr> <dbl> <chr> <dbl> <chr>
1 P323L 323 jan_jun_2021 99.2 P323L, D614G
2 D614G 614 jan_jun_2021 99.9 P323L, D614G
The way I have been trying to do this is:
head(data7) %>% mutate(newCol = str_split(newCol, ", "))
But it gives me a list inside the newCol column:
# A tibble: 6 × 5
# Groups: mutations, Dataset, Pos [6]
mutations Pos Dataset percentage_occurance newCol
<chr> <dbl> <chr> <dbl> <list>
1 P323L 323 jan_jun_2021 99.2 <chr [4]>
2 D614G 614 jan_jun_2021 99.9 <chr [4]>
3 D279N 279 jan_jun_2021 6.30 <chr [55]>
4 A598S 598 jan_jun_2021 0.0538 <chr [3,157]>
5 P681H 681 jan_jun_2021 9.92 <chr [34]>
6 G204R 204 jan_jun_2021 13.3 <chr [21]>
Is there any way to get my desired output as I have a dataframe of 3890 rows and I want to do this for all of them?
I am new to stack overflow please forgive me for any mistakes I would have made while posing the question. Thanks in advance :).