I have a dataset with around 25 million rows. I am taking a subset of these rows and performing a function which works fine. However, what I then need to do is update the values in original dataset with new values while retaining the rest. I am sure this is straightforward but I just can't get my head around it.
This is a simplified version of what I am dealing with:
require("data.table")
df <-data.frame(AREA_CD = c(sample(1:25000000, 25000000, replace=FALSE)), ALLOCATED = 0, ASSIGNED = "A", ID_CD = c(1:25000000))
df$ID_CD <- interaction( "ID", df$ID_CD, sep = "")
dt <- as.data.table(df)
sub_dt <- dt[5:2004,]
sub_dt[,ALLOCATED:=ALLOCATED+1]
sub_dt[,ASSIGNED:="B"]
What I am after is the values in 'ALLOCATED' and 'ASSIGNED' from sub_dt
to replace the 'ALLOCATED' and 'ASSIGNED' values in dt
based on the 'ID_CD' column. The output I would be after, based on my example, would still have 25 million rows but have 2,000 updated rows. Any help would be much appreciated. Thanks.