The main issue here is how to avoid loops when applying functions to subsets of panels.
I want to have data like these:
id year w pdvw
1 1930 2 10
1 1940 3 15.5
1 1950 5 23.5
1 1960 7.5 27.5
1 1970 11 NA
1 1980 9 NA
2 1930 NA NA
2 1940 NA NA
2 1950 1 10
2 1960 3 17
2 1970 6 NA
2 1980 8 NA
The actual data are much more numerous and slightly more complex. I am trying to produce the last column (pdvw) from the other columns. pdvw is the sum of the next (in time) three entries of w (e.g. pdvw[1] = 2+3+5). I can easily write
for(t in seq(from=1930,to=1960,by=10)){
for(i in c(1,2)){
if(is.na(w[id==i & t==year])==FALSE){
pdvw[id==i & year==t] = sum(w[id==i & t<=year & year<=t+21])
}
}
}
My application is not very large (20 values for year and 150 values for id), but I have been told to avoid such loops when possible, so I want to see if there is a better way. I am not so concerned with avoiding the loop over years, because 20 iterations is negligible, but I do want to learn to be a better coder. I thought something with by
might help, but I am not sure exactly what.
My solution above makes use (possibly dangerously) of the fact that missing values for w are never preceded by non-missing values (a mere coincidence of history--hence, the "possibly dangerously"). I included the missing values, because it is important that any solution can deal with the fact that the pdvw calculations must begin once data are available for a given panel.