0

I have 2 data frames and want to put a match column on one of them

library(plyr)
d1<-data.frame(date=c("2015-01-01","2015-02-05"),s= c("b","s"),name=c("bob","frank"),number=c(10,10.44), MatchorNoMatch= as.character(c("","")))
d1
d2<-data.frame(date2=c("2015-01-01","2015-02-06"),s2= c("b","b"),name2=c("bob","george"),number2=c(10,114))
d2

d1[d1$date %in% d2$date2 &  d1$s %in% d2$s2 & d1$name %in% d2$name2 &  d1$number %in% d2$number2,"MatchorNoMatch"] <- "match"
d1

here is what I get when I run that:

> library(plyr)
> d1<-data.frame(date=c("2015-01-01","2015-02-05"),s= c("b","s"),name=c("bob","frank"),number=c(10,10.44), MatchorNoMatch= as.character(c("","")))
> d1
        date s  name number MatchorNoMatch
1 2015-01-01 b   bob  10.00               
2 2015-02-05 s frank  10.44               
> d2<-data.frame(date2=c("2015-01-01","2015-02-06"),s2= c("b","b"),name2=c("bob","george"),number2=c(10,114))
> d2
       date2 s2  name2 number2
1 2015-01-01  b    bob      10
2 2015-02-06  b george     114
> 
> d1[d1$date %in% d2$date2 &  d1$s %in% d2$s2 & d1$name %in% d2$name2 &  d1$number %in% d2$number2,"MatchorNoMatch"] <- "match"
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "match") :
  invalid factor level, NA generated
> d1
        date s  name number MatchorNoMatch
1 2015-01-01 b   bob  10.00           <NA>
2 2015-02-05 s frank  10.44    

I am getting a NA in the MatchOrNoMatch column. Any idea?

===========ACTAUALLY I JUST NEEDE TO PUT stringASFactors = FALSE

here is why using %in% won't work. Bob shoudl not be a match

library(plyr)
d1<-data.frame(date=c("2015-01-01","2015-02-05","2015-01-01"),s= c("b","s","s"),name=c("bob","frank","g"),number=c(10,10.44,66), match= as.character(c("","","")),stringsAsFactors= FALSE)
d1
class(d1$match)
d2<-data.frame(date2=c("2015-01-15","2015-02-05","2015-01-01"),s2= c("b","s","s"),name2=c("bob","frank","g"),number2=c(10,10.44,55),stringsAsFactors= FALSE)
d2

d1[d1$date %in% d2$date2 &  d1$s %in% d2$s2 & d1$name %in% d2$name2 &  d1$number %in% d2$number2,"match"] <- d2[d1$date %in% d2$date2 &  d1$s %in% d2$s2 & d1$name %in% d2$name2 &  d1$number %in% d2$number2, "name2"]
d1
user3022875
  • 8,598
  • 26
  • 103
  • 167
  • possible duplicate of [Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2](http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in) – zx8754 Apr 09 '15 at 17:09
  • Rather than editing your question substantially, changing it after it's been answered, it's probably better to ask a new question. In this case, if you add `MatchOrNoMatch = "match"` as a colum in `d2` and use a left join, in the result it will be `match` if there was a match, `NA` if there wasn't. – Gregor Thomas Apr 09 '15 at 18:17

1 Answers1

0

This is really easy to do with just the base merge command from R.

d2$name2<-d2$name
merge(d1,d2,all.x=TRUE)
        date s  name number name2
1 2015-01-01 b   bob  10.00   bob
2 2015-02-05 s frank  10.44  <NA>

merge(d1,d2,by=c("date","s","name","number"),all.x=TRUE)

edited in your specific column names that you wanted to match by.

bjoseph
  • 2,116
  • 17
  • 24