Create a group index for values connected directly and indirectly

Question

I would like to generate indices to group observations based on two columns. But I want groups to be made of observation that share, at least one observation in commons.

In the data below, I want to check if values in 'G1' and 'G2' are connected directly (appear on the same row), or indirectly via other intermediate values. The desired grouping variable is shown in 'g'.

For example, A is directly linked to Z (row 1) and X (row 2). A is indirectly linked to 'B' via X (A -> X -> B), and further linked to Y via X and B (A -> X -> B -> Y).

dt <- data.frame(id = 1:10,
                 G1 = c("A","A","B","B","C","C","C","D","E","F"),
                 G2 = c("Z","X","X","Y","W","V","U","s","T","T"),
                 g = c(1,1,1,1,2,2,2,3,4,4))

dt
#    id G1 G2 g
# 1   1  A  Z 1
# 2   2  A  X 1
# 3   3  B  X 1
# 4   4  B  Y 1
# 5   5  C  W 2
# 6   6  C  V 2
# 7   7  C  U 2
# 8   8  D  s 3
# 9   9  E  T 4
# 10 10  F  T 4

I tried with group_indices from dplyr, but haven't managed it.

score 19 · Accepted Answer · answered Jul 13 '17 at 11:50

Using igraph get membership, then map on names:

library(igraph)

# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership

myGroups 
# A B C D E F Z X Y W V U s T 
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4 

# then map on names
df1$group <- myGroups[df1$G1]


df1
#    id G1 G2 group
# 1   1  A  Z     1
# 2   2  A  X     1
# 3   3  B  X     1
# 4   4  B  Y     1
# 5   5  C  W     2
# 6   6  C  V     2
# 7   7  C  U     2
# 8   8  D  s     3
# 9   9  E  T     4
# 10 10  F  T     4

Create a group index for values connected directly and indirectly

1 Answers1

Linked

Related