11

I have an R question that I'm even sure how to word in one sentence, and couldn't find an answer for this yet.

I have two data frames that I would like to 'intersect' and find all rows where column values match in two columns. I've tried connecting two intersect() and which() statements with &&, but neither has given me what I want yet.

Here's what I mean. Let's say I have two data frames:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       0      0       0         0            0
2 stack@exchange.com EIFLS0LS        1       0      0       0         0            0
3     data@frame.com EIFLS0LS        1       0      0       0         0            0
4    block@quote.com EIFLS0LS        1       0      0       0         0            0
5          ht@ml.com EIFLS0LS        1       0      0       0         0            0
6     tele@phone.com EIFLS0LS        1       0      0       0         0            0

> testBounced
               Email Campaign
1 stack@overflow.com        1
2 stack@overflow.com        2
3     data@frame.com        2
4    block@quote.com        1
5          ht@ml.com        1
6        lap@top.com        1

As you can see, there are some values in the column Email that intersect, and some from the column Campaign that intersect. I want all of the rows from testData in which BOTH columns match.

ie:

               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       0      0       0         0            0
2    block@quote.com EIFLS0LS        1       0      0       0         0            0
3          ht@ml.com EIFLS0LS        1       0      0       0         0            0

EDIT:

My goal in finding these columns is to be able to update a row in the original column. So the final output that I would like is:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       1      0       0         0            0
2 stack@exchange.com EIFLS0LS        1       0      0       0         0            0
3     data@frame.com EIFLS0LS        1       0      0       0         0            0
4    block@quote.com EIFLS0LS        1       1      0       0         0            0
5          ht@ml.com EIFLS0LS        1       1      0       0         0            0
6     tele@phone.com EIFLS0LS        1       0      0       0         0            0

My apologies if this is a duplicate, and thanks in advance for your help!

EDIT2::

I ended up just using a for loop, nothing great, but doesn't feel efficient. The dataset was small enough to do it quickly, though. If anyone has a quick, R-style way to do it, I'd be happy to see it!

so13eit
  • 942
  • 3
  • 11
  • 22

2 Answers2

9

You want the function merge.

merge is commonly used to merge two tables by one similar common, but the by argument can allow multiple columns:

merge(testData, testBounced, by=c("Email", "Campaign"))

All pairs of Email and Campaign that don't match will be discarded by default. That's controllable by the arguments all.x and all.y, which default to FALSE.

The default argument for by is intersect(names(x, y)), so you technically don't need to specify the columns in this case, but it's good for clarity.

Señor O
  • 17,049
  • 2
  • 45
  • 47
  • Thanks for the comment. I realize I wasn't being clear- My goal is to be able to change values of other columns in the original table (in this case testData) on the columns that work. I'll update the original question. – so13eit Jul 26 '13 at 18:55
7

If you use data.tables and key by the columns you want to match, then you can accomplish your goal in one line:

    tData[tBounce, Bounced := 1L]



Here is the full process:

library(data.table)
keys <- c("Email", "Campaign")
tData <- data.table(testData, key=keys)
tBounce <- data.table(testBounce, key=keys)

tData[tBounce, Bounced := 1L]

Results:

tData

                Email   Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1:    block@quote.com EIFLS0LS        1       1      0       0         0            0
2:     data@frame.com EIFLS0LS        1       0      0       0         0            0
3:          ht@ml.com EIFLS0LS        1       1      0       0         0            0
4: stack@exchange.com EIFLS0LS        1       0      0       0         0            0
5: stack@overflow.com EIFLS0LS        1       1      0       0         0            0
6:     tele@phone.com EIFLS0LS        1       0      0       0         0            0
> 
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • I knew there had to be an easy way. Thanks for introducing me to that package! – so13eit Jul 26 '13 at 20:07
  • +1: clever use of X[Y, foo] syntax – Señor O Jul 26 '13 at 20:30
  • This is excellent!! I had done something similar to select on one column using "match", but trying to extend that to selecting (or subsetting or whatever we want to call this) on > 1 column really stymied me. – WGray Oct 10 '14 at 11:16
  • I am very interested in this solution. In my case, I would like to find all rows where column values of a data frame match in three columns of another data frame. But, I don't understand why there is "Bounced :=1L". Thanks for your help. – Marine Dec 08 '17 at 05:55
  • Hi @Marine, probably best to open a new question and linking back to this one – Ricardo Saporta Dec 13 '17 at 20:10