2

I'm just getting started with R, and I'm wondering how I can find the intersection of the elements from two rows of a dataframe. I tried

intersect(thing[1,],thing[2,])

but it gave me a complete nonsense answer (something that definitely is not in the intersection, while omitting the thing that was in the intersection).

How should I approach this problem?

gagolews
  • 12,836
  • 2
  • 50
  • 75
  • 4
    Can you provide the data set so we can reproduce the error or even better a minimal spinet of data that reproduces the error? – Tyler Rinker Mar 06 '12 at 21:50
  • A quick example of `thing <- matrix(c(1:10,5:14),2,byrow=TRUE);intersect(thing[1,],thing[2,])` works just fine, so what is your data and what do you expect? – Sacha Epskamp Mar 06 '12 at 22:16
  • This sounds similar to the my problem, but I use a list of lists, and I would obtain an *adjacent matrix* (a matrix with lengths of all possibles intersection sets). In my chase, the intersection with itself produce lengths equal to 1, the other equal to 0... – gunzapper Jul 10 '13 at 13:28
  • sorry folks... I made I big mistake.... in my case I'm working with a list of vectors, to fix it I'm using now `[[i]]` instead of `[i]` to reach the vector... XD – gunzapper Jul 10 '13 at 13:40

1 Answers1

3

If the columns are all of the same type (e.g. all numbers), first convert to a matrix via as.matrix, then apply intersect. For example, if the data frame is called z:

zz <- as.matrix(z)
intersect(zz[1,], zz[2,])

If the columns have different types of variables, it may be necessary to first identify which columns are actually comparable, since you wouldn't want to compare a level variable to an integer. For example:

z <- data.frame(AA = c( 1,   1,   3,   4), 
                BB = c( 1,   5,   3,   1),
                CC = c('1', 'a', 'b', 'b'),
                DD = c( 1,   2,   3,   4)
z[z[,1] == z[,3],1]

While "1" will be returned here, the "1" can have a completely different meaning for a level variable and for a numeric variable, so we shouldn't want to compare numerical variables and level variables, at least not without careful oversight.

There may be a slick solution for the scenario where the data frame has several different types, but nothing is coming to mind...

David Diez
  • 679
  • 1
  • 6
  • 13