R data.table replacing an index of values from another data.table

Question

Hi still trying to figure out data.table. If I have a data.table of values such as those below, what is the most efficient way to replace the values with those from another data.table?

set.seed(123456)

a=data.table(
  date_id = rep(seq(as.Date('2013-01-01'),as.Date('2013-04-10'),'days'),5),
  px =rnorm(500,mean=50,sd=5),
  vol=rnorm(500,mean=500000,sd=150000),
  id=rep(letters[1:5],each=100)
  )

b=data.table(
  date_id=rep(seq(as.Date('2013-01-01'),length.out=600,by='days'),5),
  id=rep(letters[1:5],each=600),
  px=NA_real_,
  vol=NA_real_
  )

setkeyv(a,c('date_id','id'))
setkeyv(b,c('date_id','id'))

What I'm trying to do is replace the px and vol in b with those in a where date_id and id match I'm a little flummoxed with this - I would suppose that something along the lines of might be the way to go but I don't think this will work in practice.

b[which(b$date_id %in% a$date_id & b$id %in% a$id),list(px:=a$px,vol:=a$vol)]

EDIT

I tried the following

t = a[b,roll=T]
t[!is.na(px),list(px.1:=px,vol.1=vol),by=list(date_id,id)]

and got the error message

Error in `:=`(px.1, px) : 
  := is defined for use in j only, and (currently) only once; i.e., DT[i,col:=1L] and DT[,newcol:=sum(colB),by=colA] are ok, but not DT[i,col]:=1L, not DT[i]$col:=1L and not DT[,{newcol1:=1L;newcol2:=2L}]. Please see help(":="). Check is.data.table(DT) is TRUE.

I *think* the way to do this is with a fast rolling join. This joins the columns to the table based on the key `b <- b[a,roll=T]` you could just drop the original `NA` columns after if you don't want them, although there is probably a better way to do it in one step. — Simon O'Hanlon, May 25 '13 at 11:43
Thanks @SimonO101, in practice, b is a very large array which will be progressively populated every day. I need to preserve the remaining values as they are in b — Tahnoon Pasha, May 25 '13 at 14:07
If you're updating something every day, shouldn't you be using a database management system? There are packages that (I think) allow you to access your DB from R: postgresql and RMySQL. — Frank, May 26 '13 at 04:23
Thanks @Frank that is where I'd like the information to wind up. This is back data and then each new data block is intended to be transferred to a postgres database. I can't control the transmission medium which is likely to be excel sheets with slightly variable formatting... Wish I could.. :-) — Tahnoon Pasha, May 26 '13 at 06:56

score 8 · Accepted Answer · answered May 28 '13 at 23:21

8

If you are wanting to replace the values within b you can use the prefix i.. From the NEWS regarding version 1.7.10

The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.

b[a, `:=`(px = i.px, vol = i.vol)]

answered May 28 '13 at 23:21

mnel

113,303
27
265
254

1

Although this is a _very_ clever and terrific answer, I would take a longer route and do this: `merge(a, b, by=c("date_id","id"), all=TRUE)` and then rename and check the result. – geneorama Aug 29 '14 at 15:54

score 2 · Answer 2 · answered May 28 '13 at 17:57

2

Doesn't sound like you need the roll from your description, and it seems like you want to do this instead when you get your error:

t[!is.na(px),`:=`(px.1=px,vol.1=vol),by=list(date_id,id)]

answered May 28 '13 at 17:57

eddi

49,088
6
104
155

R data.table replacing an index of values from another data.table

2 Answers2

Linked