3

If I have two data.tables, dt1, and dt2, I want the number of matches between columns using an if then sort of logic. If dt1$V1==dt$V2, then does dt$V1 == dt$V2? But it is key for this if-then statement to group by the matches in dt1$V1 == dt$V2. I would like to use data.table for its efficiency, since I actually have a large dataset.

dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))

In this dummy example, there are 3 matches between the V1s, but only two within those groups for V2s. So the answer (using nrow perhaps, if I subset), would be 2.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Seth
  • 33
  • 4
  • If my output was a data.table of the matching answers, that would be best. – Seth Jun 09 '17 at 20:11
  • Of course I can merge the data.tables. Then I would use some sort of group subset with "by". Something like: "by=dt1$V1==dt2$V2" is the logic I need. – Seth Jun 09 '17 at 20:20
  • Possible duplicate of https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – akrun Jun 09 '17 at 20:44
  • 1
    @akrun No. IMHO it is not a clear duplicate, because OP didn't ask to merge/join. OP asked to find common elements. So, now you are accusing me of your own standard practise? – Jaap Jun 11 '17 at 07:23
  • @akrun Even if _one of the answers_ is using inner join it doesn't mean that the _question_ is a duplicate. And none of the answers in your duplicate mentions `fintersect()`. – Uwe Jun 11 '17 at 07:50

3 Answers3

6

I suppose you are looking for fintersect:

fintersect(dt1,dt2)

gives:

   V1 V2
1:  d  4
2:  e  5

To get the number of rows, add [, .N]:

fintersect(dt1,dt2)[, .N]

which gives:

[1] 2
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

Well this is not pretty, but, it works:

sum(dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)

Just read your comment, if you want a data.table with the correct combinations you can make it even longer, like this:

dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]

    V1 V2
1:  d  4
2:  e  5

I'm definitely looking forward to see other answers :)

psychOle
  • 1,054
  • 9
  • 19
1

We can just do a join

dt1[dt2, on = names(dt1), nomatch = 0]
#   V1 V2
#1:  d  4
#2:  e  5

or inner_join from dplyr

library(dplyr)
inner_join(dt1, dt2)
#  V1 V2
#1  d  4
#2  e  5

Or with merge

merge(dt1, dt2)
#   V1 V2
#1:  d  4
#2:  e  5

For all of the above the number of matches can be find by nrow

nrow(merge(dt1, dt2))
akrun
  • 874,273
  • 37
  • 540
  • 662