Before I end up at -7 like Two by two matching between dataframes in r did, let me say I've already read the following pages:
- How to copy row from one data.frame in to another [R]
- R How to update a column in data.frame using values from another data.frame
- Using one data.frame to update another
Actually, the last one is really similar to what I want, but not the same, because my columns are different
I have two dataframes, let's say:
> d <- data.frame(year=c(2004,2004,2006),month = c(1,5,3), height = c(1000,2000,3000) )
> d
year month height
1 2004 1 1000
2 2004 5 2000
3 2006 3 3000
> e <- data.frame(year=c(2004),month=c(5), height = c(9999))
> e
year month height
1 2004 5 9999
Obviously the real data is longer than this.
I want to merge the values from e into d
Try raw merge:
> merge(d,e)
[1] year month height
<0 rows> (or 0-length row.names)
Ok. So add "by":
> merge(d,e,by=c("year","month"))
year month height.x height.y
1 2004 5 2000 9999
Ok, it did an inner join, and got rid of all the original data in d. So try left outer join:
> merge(d,e,by=c("year","month"),all.x = T)
year month height.x height.y
1 2004 1 1000 NA
2 2004 5 2000 9999
3 2006 3 3000 NA
It did a join, and it's correct as per outer-join definition, but it didn't do what I want, which is to update the values in d from the values in e. What I really want is more like an sql update:
for (year,month,height) in e:
update d set d.height=e.height where d.year = e.year and d.month = e.month
ie the results I want would be:
> magic(d,e)
year month height
1 2004 1 1000
2 2004 5 9999
3 2006 3 3000
Of course, I could just write a bunch of for
loops, but I'm hoping there is some vectorized way of doing this?
Edit: my example had only one key column, but my real problem has two. Updated the example to reflect this.