I got a dataframe which looks like this:
df = data.frame(id=c(1,1,1,2,2,2,3,3,3),date=rep(c("1990-01","1990-02","1990-03"),3),
pd=c(0.005,0.004,0.003,0.001,0.0005,0.002,0.008,0.0065,0.002))
df
id date pd
# 1 1990-01 0.0050
# 1 1990-02 0.0040
# 1 1990-03 0.0030
# 2 1990-01 0.0010
# 2 1990-02 0.0005
# 2 1990-03 0.0020
# 3 1990-01 0.0080
# 3 1990-02 0.0065
# 3 1990-03 0.0020
The id refers to different companies. I'd like to calculate the 'distance to default' (-qnorm(pd_t) - (-qnorm(pd_t-1)) conditioned on date and id.
My code produces the output I am looking for but takes very long due to the size of the real dataframe:
id_vec = c(1:3)
df$DD = NA
for(i in 1:3){
df[df$id==id_vec[i],] = df[df$id==id_vec[i],] %>% mutate(DD = -qnorm(pd)-lag(-qnorm(pd)))}
id date pd DD
# 1 1990-01 0.0050 NA
# 1 1990-02 0.0040 0.07624050
# 1 1990-03 0.0030 0.09571158
# 2 1990-01 0.0010 NA
# 2 1990-02 0.0005 0.20029443
# 2 1990-03 0.0020 -0.41236499
# 3 1990-01 0.0080 NA
# 3 1990-02 0.0065 0.07485375
# 3 1990-03 0.0020 0.39439245
Does anyone has an idea how I can improve the performance?