1

This is a variation of a question previously posted here. R - A loop comparing elements in common between two hierarchical lists I figured this problem might present sufficient alteration in its solution form, hence a new post.

I would like to retrieve a list of the elements in common when comparing two lists, hierarchically structured (sites contain groups, which contain elements)

Here is some dummy data:

site<-c('A','A','A','A','A','A','A','A','A','B','B','B','B','B','B')
group<-c('A1','A1','A2','A2','A2','A3','A3','A3','A3', 
'B1','B1','B2','B2','B2','B2')
element<-c("red","orange","blue","black","white", "black","cream","yellow","purple","red","orange","blue","white","gray","salmon")
d<-cbind(site,group,element)

The twist is that I don´t want every possible comparison between groups, but only between sites. Hence, I organized the data in such manner.

#first level list - by site
sitelist<-split(d, list(d$site),drop = TRUE)
#list by group 
nestedlist <- lapply(sitelist, function(x) split(x, x[['group']], drop = TRUE))

My intention is to create a list with the element in common between groups from the two sites (my original data has additional sites). Therefore, if the data being structured as such:

    A1  A2  A3
B1  2   0   0
B2  0   2   0

I need the list of elements appearing in the intersection of A1/B1, and A2/B2. Therefore the outuput being:

output
$A1-B1
[1] "red"     "orange"

$A2-B2
[2] "blue"    "white"

My attempt is similar to what was posted in the previous related question, with adjustments of what I comprehend as being what would work.

t <- outer(1:length(d$A),
         1:length(d$B),
         FUN=function(i,j){
           sapply(1:length(i),
                  FUN=function(x) 
                    intersect(d$A[[i]]$element, d$B[[j]]$element) )
         })

Again, any help is much appreciated, and apologies if this is too similar of a question. My attempts at tweaking all of the suggestions have failed.

  • How much of this code is working? Mine fails quickly with `$ operator is invalid` because (I assume) your `d` is a frame but what you provided here is a `matrix`. – r2evans Dec 07 '18 at 01:03

1 Answers1

2

The premise of your code (outer) is sound. Here are a couple of ideas. (Note that I changed your data to use cbind.data.frame(..., stringsAsFactors=FALSE).)

First, restructuring a little helped me:

dl <- list(
  A = with(subset(d, site=="A"), split(element, group)),
  B = with(subset(d, site=="B"), split(element, group))
)
str(dl)
# List of 2
#  $ A:List of 3
#   ..$ A1: chr [1:2] "red" "orange"
#   ..$ A2: chr [1:3] "blue" "black" "white"
#   ..$ A3: chr [1:4] "black" "cream" "yellow" "purple"
#  $ B:List of 2
#   ..$ B1: chr [1:2] "red" "orange"
#   ..$ B2: chr [1:4] "blue" "white" "gray" "salmon"

Which option you prefer depends a bit on how you intend to retrieve the results. If you're doing it programmatically, then I think I prefer option 1, where it is perfectly unambiguous random-access to the pairings; using option 2 for that random-access pairings, you'd need to combine your desired indices into a new string and assume it's in the list.

If your desired outcome is mostly for reporting, then perhaps option 2 works, as it by-default is unrolled with human-readable names. YMMV.

Option 1:

func <- function(a,b) Map(intersect, a, b)
o1 <- outer(dl[[1]], dl[[2]], func)
o1
#    B1          B2         
# A1 Character,2 Character,0
# A2 Character,0 Character,2
# A3 Character,0 Character,0

This may seem like jibberish, but each cell is a list:

o1["A1","B1"]
# [[1]]
# [1] "red"    "orange"
o1[["A2","B2"]] # only difference: double-bracket, returns vector not list
# [1] "blue"  "white"
apply(o1, 1, lengths)
#    A1 A2 A3
# B1  2  0  0
# B2  0  2  0

Option 2:

eg2 <- do.call(expand.grid, dl)
o2 <- setNames(Map(intersect, eg2$A, eg2$B),
               apply(sapply(eg2, names), 1, paste, collapse = "-"))
o2
# $`A1-B1`
# [1] "red"    "orange"
# $`A2-B1`
# character(0)
# $`A3-B1`
# character(0)
# $`A1-B2`
# character(0)
# $`A2-B2`
# [1] "blue"  "white"
# $`A3-B2`
# character(0)

If empty elements is a problem, you can

Filter(length, o2)
# $`A1-B1`
# [1] "red"    "orange"
# $`A2-B2`
# [1] "blue"  "white"
r2evans
  • 141,215
  • 6
  • 77
  • 149