4

Before I end up at -7 like Two by two matching between dataframes in r did, let me say I've already read the following pages:

Actually, the last one is really similar to what I want, but not the same, because my columns are different

I have two dataframes, let's say:

> d <- data.frame(year=c(2004,2004,2006),month = c(1,5,3), height = c(1000,2000,3000) )
> d
  year month height
1 2004     1   1000
2 2004     5   2000
3 2006     3   3000

> e <- data.frame(year=c(2004),month=c(5), height = c(9999))
> e
  year month height
1 2004     5   9999

Obviously the real data is longer than this.

I want to merge the values from e into d

Try raw merge:

> merge(d,e)
[1] year   month  height
<0 rows> (or 0-length row.names)

Ok. So add "by":

> merge(d,e,by=c("year","month"))
  year month height.x height.y
1 2004     5     2000     9999

Ok, it did an inner join, and got rid of all the original data in d. So try left outer join:

> merge(d,e,by=c("year","month"),all.x = T)
  year month height.x height.y
1 2004     1     1000       NA
2 2004     5     2000     9999
3 2006     3     3000       NA

It did a join, and it's correct as per outer-join definition, but it didn't do what I want, which is to update the values in d from the values in e. What I really want is more like an sql update:

for (year,month,height) in e:
    update d set d.height=e.height where d.year = e.year and d.month = e.month

ie the results I want would be:

> magic(d,e)
  year month height
1 2004     1   1000
2 2004     5   9999
3 2006     3   3000

Of course, I could just write a bunch of for loops, but I'm hoping there is some vectorized way of doing this?

Edit: my example had only one key column, but my real problem has two. Updated the example to reflect this.

Community
  • 1
  • 1
Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71

2 Answers2

7

You can use data.table

edit noting both e and d have a key defined by month and year

library(data.table)
DD <- as.data.table(d)
DE <- as.data.table(e)

setkey(DD,  year, month)
setkey(DE,  year, month)

DD[DE, height := i.height]

note that I have prefixed height with i. to ensure that it is reading the height value from the i component.

If you read the introduction to data.table vignette, you will quickly understand the relationship between rownames and data.table keys!

mnel
  • 113,303
  • 27
  • 265
  • 254
2

Actually, the following approach is more straightforward:

rownames( d ) <- d$id
d[ e$id, ]$height <- e$height

Update: since your key is actually "year-month", you probably will be best off with the data table, but if you are not willing using that, here is what you can do:

rownames( d ) <- paste( d$year, d$month )
d[ paste( e$year, e$month ), ]$height <- e$height
January
  • 16,320
  • 6
  • 52
  • 74
  • I have two keys unfortunately, although my original question implicitly stated there are one, by giving example data with only one. I've updated the question with an example with two keys. – Hugh Perkins Oct 22 '12 at 08:09
  • In that case I recommend using `data.table` as suggested by @mnel. Although you could use `paste( year, month )` as your key (well, actually you have only one key: "year-month"). – January Oct 22 '12 at 08:13
  • I went with this approach in the end, because whilst it seems to me a bit "hacky", it has zero learning curve. – Hugh Perkins Oct 22 '12 at 09:31
  • It is "old style" for sure. Whenever I have to use `data.table`, I have to look up the idioms, while rownames and data.frames are already in my blood. – January Oct 22 '12 at 09:34