0

Say I have three lists,

> a
[[1]]
     begin end
     3     5
     9     10
     11    14

[[2]]
     begin end
     3     7
     14    18
     19    24

[[3]]
     begin end
     6     9
     14    22
     18    30

What I am trying to find is the intersection of all of the "begin" columns, so in this case the desired output would be something like

"3" "14"

I am aware of the solution offered at How to find common elements from multiple vectors?; however, this solution assumes that the number of lists is static. If the number of lists I have here were to change (say, to 5 lists, each one with similar columnar layout), how would I find the intersection?

Community
  • 1
  • 1
rdevn00b
  • 552
  • 2
  • 5
  • 14
  • `a[[1]]` does not have 14 in `begin`. Should the code consider both `begin` and `end`? – nico Sep 15 '15 at 17:21
  • Good question. The answer is no, as I want ANY intersection. So if at least two lists share a similar element in begin, this should be found. – rdevn00b Sep 15 '15 at 17:23
  • The problem with using rbind here is that I need to keep the lists separate. After this step in the program I need to determine the frequency of elements in each list with respect to those that are in the intersected set. so for example, if [[3]]'s begin column contained 3,3,7, then the program would show that the frequency of 3 in list 1 is 1, 3 in list 2 is 1, and 3 in list 3 is 2. – rdevn00b Sep 15 '15 at 17:29
  • `Reduce(function(x, y) intersect(x, y), lapply(a, '[[', 'begin'))` note that this wont work for your example because your example has no values common to all begin columns – rawr Sep 15 '15 at 17:50
  • @rawr They're misusing the concept of "intersection of all x" and really want "appears in some pairwise intersection among x" – Frank Sep 15 '15 at 17:56

2 Answers2

1

An easy way is to collapse the list elements and use table to count them

# Recreate the data frame
a <- list(
    data.frame(begin = c(3, 9, 11), end = c(5, 10, 14)),
    data.frame(begin = c(3, 14, 19), end = c(7, 18, 24)),
    data.frame(begin = c(6, 14, 18), end = c(9, 22, 30)))

# "Collapse" the begin columns into a vector.
# We use unlist in case the data frames are not all 
# of the same length(thanks @Frank for pointing this out)
a.beg <- unlist(sapply(a, function(x){x$begin}))

# Count the elements
tb <- table(a.beg)

# Get the ones repeated at least twice 
# (need to cast to numeric as names are strings)
intersection <- as.numeric(names(tb[tb>=2]))

> intersection
[1]  3 14
nico
  • 50,859
  • 17
  • 87
  • 112
  • This is not scaleable! – rdevn00b Sep 15 '15 at 17:32
  • @rdevn00b is scaleable for any number of elements in a... how do you want it scaleable? – nico Sep 15 '15 at 17:33
  • The problem with this solution is that I have to manually enter all data frames in order to use it. In the case of a dynamic number of data frames this solution will not hold. – rdevn00b Sep 15 '15 at 17:45
  • @rdevn00b You do not have to manually enter the data.frames. nico was just trying to recreate your example data. Anywho, fyi, I am deleting my comments after they're resolved (as is customary here, to "clean up" unnecessary chatter). – Frank Sep 15 '15 at 17:46
  • @rdevn00b I am assuming you'll get your data from some source (csv file, database, somewhere on the Internet) using `read.table` or similar functions. – nico Sep 15 '15 at 17:55
0

Using @nico's input data...

full <- do.call(rbind, lapply(seq_along(a), function(i) within(a[[i]], {g = i})) )

res  <- table(full[,c("begin","g")])

#      g
# begin 1 2 3
#    3  1 1 0
#    6  0 0 1
#    9  1 0 0
#    11 1 0 0
#    14 0 1 1
#    18 0 0 1
#    19 0 1 0

The rows are the unique values of begin and the columns are the elements of the list. To see which values of begin appear in more than one element of the list, look at

res[ rowSums( res>0 ) > 1, ]
#      g
# begin 1 2 3
#    3  1 1 0
#    14 0 1 1

Probably whatever further analysis you have to do should be done on full rather than on your list of data.frames, especially if efficiency is a concern.

Frank
  • 66,179
  • 8
  • 96
  • 180