merging sets which have even one element in common R

Question

I have a list of lists like this:

lista=list()    
lista[[1]]=c( 1,  2,  4,  6,  8,  9, 10, 11, 12, 19, 32, 34, 35, 36, 37, 38)    
lista[[2]]=c(7,8)    
lista[[3]]=c(13, 14, 16, 26, 27, 28, 29, 30, 31)    
lista[[4]]=c(20, 21, 23, 25, 27, 28, 29, 30)    
lista[[5]]=c(9,10,39)
lista[[6]]=c(39,40)

So I want my output like this:

group[[1]]=c(1,2,4,6,7,8,9,10,11,12,19,32,34,35,36,37,38,39,40)    
group[[2]]=c(13,14,16,20,21,23,24,26,27,28,29,30,31)

"Opening the box":

lista[[1]], lista[[2]] and lista[[5]] merged because they have commom elements.

lista[[5]] and lista[[6]] merged because they have commom elements. So, lista[[5]] connect lista[[1]], lista[[2]] and lista[[5]].

I'm trying in this ticket:

Merge lists that share common elements

What was wrong with the solution given in the post that you referenced? — G5W, Jun 09 '17 at 19:03

digEmAll · Answer 1 · 2017-06-11T20:06:43.070

Here's a possible solution :

lista=list()    
lista[[1]]=c( 1,  2,  4,  6,  8,  9, 10, 11, 12, 19, 32, 34, 35, 36, 37, 38)    
lista[[2]]=c(7,8)    
lista[[3]]=c(13, 14, 16, 26, 27, 28, 29, 30, 31)    
lista[[4]]=c(20, 21, 23, 25, 27, 28, 29, 30)    
lista[[5]]=c(9,10,39)
lista[[6]]=c(39,40)


canBeMerged <- function(x,y){
  length(intersect(x,y)) > 0
}
mergeFun <- function(x,y){
  sort(union(x,y))
}

group <- Reduce(f=function(acc,curr){
  # since we wrapped each element inside a list with Map, here we unwrap the current element
  currVec <- curr[[1]]
  # we search in the accumulated list of "unmergeable" vectors 
  # if there is one which can be merged with currVec
  toMergeIdx <- Position(f=function(x) canBeMerged(x,currVec), acc)
  if(is.na(toMergeIdx )){ 
    # none can be merged, let's simply add currVec to the accumulated list
    acc[[length(acc)+1]] <- currVec
  }else{
    # currVec can be merged with the vector at position toMergeIdx, so we merge the 
    acc[[toMergeIdx]] <- mergeFun(acc[[toMergeIdx]],currVec)
  }
  return(acc)
},Map(lista,f=list))

Result :

> group
[[1]]
 [1]  1  2  4  6  7  8  9 10 11 12 19 32 34 35 36 37 38 39 40

[[2]]
 [1] 13 14 16 20 21 23 25 26 27 28 29 30 31

Explanation :

Reduce uses a binary function to successively combine the elements of a given vector, e.g. given a vector of elements c(1,3,7) and a binary function + Reduce(c(1,3,7),f='+') will call the function one time performing 1+3 (the initial accumulated value of Reduce is the first value, if not specified), then will call the function again passing the current accumulated value 4 and summing it with the next value, computing 4+7, then finally returns 11.

Therefore, in this case we want to use Reduce to iterate over the list of vectors and iteratively combining them if they can be merged or keeping them if not. So, in this case the accumulated value of Reduce, will be a list of "unmergeable" vectors, to be checked and eventually merged to the next vector.

Note that,Since both the accumulated value and the next value of Reduce must be of the same type, we need to wrap every element of lista in a list using Map.

could you give some comments in the code to begin my studies? — Arthur Calegario, Jun 11 '17 at 04:46
@ArthurCalegario: added some code comments and a small explanation — digEmAll, Jun 11 '17 at 20:07
Thank for your comments Master Yoda! I will study it. Its not so clear yet. But solved my problem :) — Arthur Calegario, Jun 12 '17 at 11:51

score 1 · Accepted Answer · answered Jun 09 '17 at 22:18

Here is another approach which constructs a matrix showing which elements of the list intersect with each other and uses the igraph package to deduce the groups:

library(igraph)
## Construct the matrix
m = sapply(lista,function(x) sapply(lista,function(y) length(intersect(x,y))>0))
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
[1,]  TRUE  TRUE FALSE FALSE  TRUE FALSE
[2,]  TRUE  TRUE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE  TRUE FALSE FALSE
[4,] FALSE FALSE  TRUE  TRUE FALSE FALSE
[5,]  TRUE FALSE FALSE FALSE  TRUE  TRUE
[6,] FALSE FALSE FALSE FALSE  TRUE  TRUE

## Determine the groups of the graph constructed from m
groups = groups(components(graph_from_adjacency_matrix(m)))
$`1`
[1] 1 2 5 6

$`2`
[1] 3 4

## Get the unique elements of each group
res = lapply(groups,function(x) sort(unique(unlist(lista[x]))))
$`1`
 [1]  1  2  4  6  7  8  9 10 11 12 19 32 34 35 36 37 38 39 40

$`2`
 [1] 13 14 16 20 21 23 25 26 27 28 29 30 31

in the first step the sapplys work like a double for, right? — Arthur Calegario, Jun 11 '17 at 05:16
Thank you for help and attention Lamia. Learning about aplly familiys :) — Arthur Calegario, Jun 12 '17 at 11:52

merging sets which have even one element in common R

2 Answers2

Linked