2

I have a list L containing dataframes L=(A,B,C,D). Each dataframe has a column z. I would like to perform a set intersection of values in column z and count the numbers for each pairwise comparison of the dataframes in the list. (i.e. values that are shared) Such that I get a final matrix

  A B C D 
A  
B 
C   
D

Where the values of the matrix contain the sum of the number of shared values. I am not sure which is the most idiomatic way to implement this using R. I could do a for loop where I start with the first member of the list, extract the values of column z perform a set intersection and populate an empty matrix. But there could be better more efficient approach.

Any ideas and implementations?

Example:

df1 <- data.frame(z=c(1,2,3),s=c(4,5,6))
df2 <- data.frame(z=c(3,2,4),s=c(6,5,4))
my.list <- list(df1, df2)

expected output

    df1 df2
df1  3  2
df2  2  3
eastafri
  • 2,186
  • 2
  • 23
  • 34
  • Sorry if my layman description is not terse enough. I hope you get the idea. I just want to get values that are shared in column z by comparing each df. So in the first instance you would take intersect(df1$z,df2$z) and so on to produce the matrix – eastafri Jun 03 '16 at 19:10
  • 1
    This might interest you: http://stackoverflow.com/q/17598134/ – Frank Jun 03 '16 at 19:20

1 Answers1

2

You can possibly try the outer function:

outer(my.list, my.list, function(x, y) Map(function(i, j) length(intersect(i$z, j$z)), x, y))
    df1 df2
df1 3   2  
df2 2   3 
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Would it be possible to return a data-frame? This function returns a list and it is not obvious how you return a data-frame. – eastafri Jun 22 '16 at 18:52