I am new to R and struggling with grouping my dataset. This is an example of the data:
sample | profile |
---|---|
1 | A |
2 | A,B |
3 | A,B |
4 | A,C |
5 | C |
6 | A,C |
I am trying to group the profiles so that the same profiles are labelled as the same group:
sample | profile | profile group/cluster |
---|---|---|
genome 1 | A | 1 |
genome 2 | A,B | 2 |
genome 3 | A,B | 2 |
genome 4 | A,C | 3 |
genome 5 | C | 4 |
genome 6 | A,C | 3 |
From this, profiles A,B and A,C have been grouped together.
I have tried playing around with these packages
library(tidyverse)
library(janitor)
library(stringr)
dupes <- get_dupes(database, profile)
dupes
ll_by_outcome <- as.data.frame(database %>%
group_by(profile) %>%
add_count())
ll_by_outcome
But these just find duplicates within the sample. I am not sure how to go about this issue. Any help is appreciated!