I am trying to group similar entities together and can't find an easy way to do so.
For example, here is a table:
Names Initial_Group Final_Group
1 James,Gordon 6 A
2 James,Gordon 6 A
3 James,Gordon 6 A
4 James,Gordon 6 A
5 James,Gordon 6 A
6 James,Gordon 6 A
7 Amanda 1 A
8 Amanda 1 A
9 Amanda 1 A
10 Gordon,Amanda 5 A
11 Gordon,Amanda 5 A
12 Gordon,Amanda 5 A
13 Gordon,Amanda 5 A
14 Gordon,Amanda 5 A
15 Gordon,Amanda 5 A
16 Gordon,Amanda 5 A
17 Gordon,Amanda 5 A
18 Edward,Gordon,Amanda 4 A
19 Edward,Gordon,Amanda 4 A
20 Edward,Gordon,Amanda 4 A
21 Anna 2 B
22 Anna 2 B
23 Anna 2 B
24 Anna,Leonard 3 B
25 Anna,Leonard 3 B
26 Anna,Leonard 3 B
I am unsure how to get the 'Final_Group' field, in the table above.
For that, I need to assign any element that has any connections to another element, and group them together:
For example, rows 1 to 20 needs to be grouped together because they are all connected by at least one or more elements.
So for rows 1 to 6, 'James, Gordon' appear, and since "Gordon" is in rows 10:20, they all have to be grouped. Likewise, since 'Amanda' appears in rows 7:9, these have to be grouped with "James,Gordon", "Gordon, Amanda", and "Edward, Gordon, Amanda".
Below is code to generate the initial data:
# Manually generating data
Names <- c(rep('James,Gordon',6)
,rep('Amanda',3)
,rep('Gordon,Amanda',8)
,rep('Edward,Gordon,Amanda',3)
,rep('Anna',3)
,rep('Anna,Leonard',3))
Initial_Group <- rep(1:6,c(6,3,8,3,3,3))
Final_Group <- rep(c('A','B'),c(20,6))
data <- data.frame(Names,Initial_Group,Final_Group)
# Grouping
data %>%
select(Names) %>%
mutate(Initial_Group=group_indices(.,Names))
Does anyone know of anyway to do this in R?