1

I'm still a little green to R. So bear with me.

I have a list of vectors and I would like to compare each vector in the list and then tack on the matching list to the end of the match one. I am looking for robust repeatable solution, regardless of number of vectors in the list.

So if I have a list (lst) made of vectors:

lst <- list(c("a", "b"), c("b", "c"), c("e", "f"), c("c", "g"))

I want to get a list of vectors like this as a result:

[[1]]
[1] "a" "b" "c" "g"

[[2]]
[1] "e" "f"

So I've been able to make this work for a singular instance:

if(any(lst[[1]] %in% lst[[2]])){        
  c(lst[[1]], lst[[2]])
} 

but now I'm trying to loop it over the entire list and this is what I have so far, but I'm a little stuck:

endmembers <- lapply(seq_along(lst), function(i,j){
  x <- lst[[i]]
  x2 <- lst[[j]]
  if(any(x %in% x2)){            
    c(x, x2)                     
  } 
})
Andrew Chisholm
  • 6,362
  • 2
  • 22
  • 41

3 Answers3

0

I would use a recursive function to stick all the components together, then remove list items that are contained within other items:

#### helper functions ----

# Recursive function to stick list items together

fun <- function(x, d) {
  
  i <- which(sapply(d, function(y) y[1]) == tail(x, 1))
  
  if (length(i) > 0) {
  
    y <- d[[i[1]]]
    x <- c(x, y[2:length(y)])
    x <- fun(x, d)
  
  }
  
  x
}

# is vector inside another vector? - must be in the same sequence and order

inside <- function(x, y) {
  
  if ( isTRUE(all.equal(x, y)) )
    return(FALSE)
  
  if ( length(x) > length(y) )
    return(FALSE)
  
  if ( !any(x %in% y))
    return(FALSE)
  
  !is.unsorted( sapply(x, function(a, b) which(a == b), b = y), strictly = TRUE )
  
}

#### analysis ----

# Stick vectors together if last == first

d <- lapply(lst, fun, d = lst)

# remove list items that are inside other list items - there might be a more
# elegant solution to this, I'm confused by it.

d[!apply(
  sapply(d,
         function(x, y) sapply(y, function(x, y) inside(x, y), y = x),
         y = d),
  1,
  any)]
Paul
  • 2,877
  • 1
  • 12
  • 28
0

An easy option is using igraph

library(igraph)
u <- cluster_infomap(graph_from_data_frame(as.data.frame(do.call(rbind,lst))))
out <- split(u$names,u$membership)

which gives

> out
$`1`
[1] "a" "b" "c" "g"

$`2`
[1] "e" "f"

If you want base R solution with for loops, here is one version

out <- lst[1]
for (v in lst) {
  flag <- 1
  for (k in 1:length(out)) {
    if (any(v %in% out[[k]])) {
      out[[k]] <- union(out[[k]], v)
      flag <- 0
      break
    }
  }
  if (flag) out[[length(out) + 1]] <- v
}

such that

> out
[[1]]
[1] "a" "b" "c" "g"

[[2]]
[1] "e" "f"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

In case anyone wants to know what I did, I followed the code in merging sets which have even one element in common R that was commented.

m <- sapply(lst, function(x) sapply(lst, function(y) (any(x %in% y))))

#determine the groups of the graph constructed from m
groups <- groups(components(graph_from_adjacency_matrix(m)))

#Get the unique elements of each group
endmembers <- lapply(groups,function(x) sort(unique(unlist(lst[x]))))