I have a dataset with 3 variables. Two are are factor variables( Policy_num and presidentnumber). The 3rd variables is a continues value (pred). I would like to create a new variable that is the first difference of pred foreach presidentnumber and Policy_num. The following code works but produces for me just the first difference of pred by presidentnumber. The dataframe is named dydx. This seems so simple and yet, I'm stumped.
newobject2 = dydx %>%
group_by(Policy_num,presidentnumber) %>%
mutate(dydx2 = pred-lag(pred))
produces this:
ob Polic_num Pres pred dydx2
1 SocialWelfare Reagan 5.215365 NA
2 SocialWelfare Reagan 4.373108 -0.8422576
3 Agriculture Reagan 5.180910 0.8078020
4 Agriculture Reagan 4.338652 -0.8422576
5 Commerce Reagan 5.206816 0.8681638
6 Commerce Reagan 4.364558 -0.8422576
It should look like this:
ob Polic_num Pres pred dydx2
1 SocialWelfare Reagan 5.215365 NA
2 SocialWelfare Reagan 4.373108 -0.8422576
3 Agriculture Reagan 5.180910 NA
4 Agriculture Reagan 4.338652 -0.8422576
5 Commerce Reagan 5.206816 NA
6 Commerce Reagan 4.364558 -0.8422576
Here is code for verifiable example.
presidentnumber = c("Reagan", "Reagan", "Reagan", "Reagan", "Bush", "Bush",
"Bush", "Bush", "Clinton", "Clinton", "Clinton", "Clinton")
Policy_num=c("Agriculture", "Agriculture", "Social", "Social","Agriculture",
"Agriculture", "Social", "Social","Agriculture", "Agriculture", "Social",
"Social")
pred=seq(1:12)
ND=as.data.frame(cbind.data.frame(presidentnumber, Policy_num, pred))
newobject4=ND %>%
group_by(Policy_num, presidentnumber ) %>%
mutate(dydx2 = c(NA, diff(pred)))
What this produces is this:
Obs presidentnum Policy_num pred dydx2
1 Reagan Agriculture 1 NA
2 Reagan Agriculture 2 1
3 Reagan Social 3 1
4 Reagan Social 4 1
5 Bush Agriculture 5 1
6 Bush Agriculture 6 1
7 Bush Social 7 1
8 Bush Social 8 1
9 Clinton Agriculture 9 1
10 Clinton Agriculture 10 1
11 Clinton Social 11 1
12 Clinton Social 12 1
However, every other 1 above should be NA.