0

I have the following data.frame:

df <- data.frame(V1 = c("A","X","A","Z","B","Y"),
           V2 = c("B","Y","C","Y","C","W"),
           stringsAsFactors=FALSE)
df
#   V1 V2
# 1  A  B
# 2  X  Y
# 3  A  C
# 4  Z  Y
# 5  B  C
# 6  Y  W

I want to group all the values that occur together at some point and get the following:

list(c("A","B","C"), c("X","Y","Z","W"))
# [[1]]
# [1] "A" "B" "C"
# 
# [[2]]
# [1] "X" "Y" "Z" "W"
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • 3
    Related: https://stackoverflow.com/questions/27520310/union-of-intersecting-vectors-in-a-list-in-r – MrFlick Aug 02 '18 at 15:13

1 Answers1

4

Network analyses can help.

library(igraph)
df <- data.frame(V1 = c("A","X","A","Z","B","Y"),
                 V2 = c("B","Y","C","Y","C","W"),
                 stringsAsFactors=FALSE)


g <- graph_from_data_frame(df, directed = FALSE)
clust <- clusters(g)
clusters <- data.frame(name = names(clust$membership), 
                       cluster = clust$membership,
                       row.names = NULL,
                       stringsAsFactors = FALSE)


split(clusters$name, clusters$cluster)
$`1`
[1] "A" "B" "C"

$`2`
[1] "X" "Z" "Y" "W"

You can of course leave everything in the cluster data.frame for further analyses.

phiver
  • 23,048
  • 14
  • 44
  • 56