Data
I have a "tall" dataset with with repeated observations through time at the unit level, but the intervals are irregular. There are 4e6 observations, so I'd like to keep the solution in data.table.
library(data.table)
d <- data.frame(let = rep(LETTERS[1:10], times = 2),
num = rep(c(2001,2002), each = 10),
vals = 1:20,
lagNum = c(rep(c(NA, 2002), times = 5), rep(NA, 10))
) # there are missing values in the object
d[c(1:5, 16:20),]
## let num vals lagNum
## 1 A 2001 1 NA
## 2 B 2001 2 2002
## 3 C 2001 3 NA
## 4 D 2001 4 2002
## 5 E 2001 5 NA
## 16 F 2002 16 NA
## 17 G 2002 17 NA
## 18 H 2002 18 NA
## 19 I 2002 19 NA
## 20 J 2002 20 NA
If it helps to anchor the task in the real-world, the units are countries and the repeated observations are years in which the countries hold elections.
Goal
I would like to pass unit-values from vals_{unit, past} to vals_{unit, present}. In a sense, this feels like creating a wider dataset that :shuffles: the tall observations that reoccur in the present into a wider data.table.
I already know this isn't a case for zoo or other time series applications; I figure this isn't any version of a reshape call either, but rather is a merge/join. I also know that this response and this response are moving in the right direction, but aren't getting there.
Running this as a data.frame is pretty trivial:
dMerge <- merge(x = d[!(is.na(d$lagNum)), ],
y = d[, c("let", "num", "vals")],
by.x = c("let", "lagNum"),
by.y = c("let", "num")
)
dMerge
## let lagNum num vals.x vals.y
## 1 B 2002 2001 2 12
## 2 D 2002 2001 4 14
## 3 F 2002 2001 6 16
## 4 H 2002 2001 8 18
## 5 J 2002 2001 10 20
As data.table
My first thought was to duplicate the data.table and assign key values appropriately:
dat <- data.table(let = rep(LETTERS[1:10], times = 2),
num = rep(c(2001,2002), each = 10),
vals = 1:20,
lagNum = c(rep(c(NA, 2002), times = 5), rep(NA, 10))
)
setkeyv(dat, cols = c("let", "num"))
dat2 <- dat
setkeyv(dat2, cols = c("let", "lagNum"))
mDat <- merge(dat, dat2)
...and if this had worked I probably would have run with it. But it didn't, and it doesn't seem particularly 'data.table' to be wantonly duplicating data.