In the data.table below, individuals have names given in p1. Each of these individuals have incomes given by inc_1 generated as follow:
data_gen = function(){
p_names = letters[1:10]
dataset = data.table(p1 = c(sample(p_names,10,replace=F),"y"), p2 = c(sample(p_names,10,replace=F),"z"), inc_1 = round(rnorm(11,1000,200)))
return(dataset)
}
set.seed(43210)
data_1 = data_gen()
data_1
Each individual p1 is closely related to individuals listed in p2 and I am interested in having the income of p2 listed in a new column inc_2 just rigth to inc_1. The "match" command is useful for achieving this aim
data_2 = data_1 # saved for latter use
data_1$inc_2 = data_1$inc_1[match(data_1$p2,data_1$p1,nomatch = NA)]
data_1
In data_1, we see the income inc_2 of p2="i" listed just right to inc_1 of p1="b" and so on... However, with new dimension in the dataset, the year, I am not able to generate the partner p2 income inc_2 correctly over years.
set.seed(43211)
data_3 = data_gen()
data_4 = rbind(cbind(year=rep(2015,11),data_2),cbind(year=rep(2016,11),data_3))
data_4
If we reproduce the same code as before, then 'match' misses the time dimension and does not return for 2016 and p1="g" the income inc_2 of p2="h" for the year 2016, but instead the 2015 income of "h"
data_4$inc_2 = data_4$inc_1[match(data_4$p2,data_4$p1,nomatch = NA)]
data_4
I thought that adding by=c('year') would solve the problem, but none of the line below generates inc_2 properly
data_4[ , inc_1[match(p2,p1,nomatch = NA)],by=c('year')] # close too, but v2 is not included in data_4
data_4[ , inc_2 = inc_1[match(p2,p1,nomatch = NA)],by=c('year')]
data_4$inc_2 = data_4[ , inc_1[match(p2,p1,nomatch = NA)],by=c('year')]
I would appreciate any comment on this point...