I am trying to create a new variable in a dataset (+/- 10K rows) by iterating over another data frame based on two variables in the dataset. Somehow I can't figure it out without using merge (just starting to learn R) maybe someone can guide me in the right direction.
Snapshot from the dataset - dat
ContractNumber | PDRating | PD_month
---
1 | 7+ | 1
---
1 | 7+ | 2
---
1 | 7+ | 3
---
2 | 6+ | 1
---
2 | 6+ | 2
---
and so on.....
So each contract has a PDRating (which stands for Probability of Default rating) and a PDMonth which represents the month the contract is in. So if a contract has a length of 60 months there are 60 records for the contract. (btw this is my first question on StackOverflow and I haven't figured out yet how I can format a nice looking table)
The data frame (PD_Table) I want to iterate over consists of 180 rows, representing months and in de header the PDRatings as column names. Which looks like this:
PD_month | 5- | 6+ | 6 | 6- | 7+ | ...
---
1 | 0.0001 | 0.0002 | 0.0004 | 0.0005 | ...
---
2 | 0.0001 | 0.0002 | 0.0004 | 0.0005 | ...
---
3 | 0.0001 | 0.0002 | 0.0004 | 0.0005 | ...
---
4 | 0.0001 | 0.0002 | 0.0004 | 0.0005 | ...
---
5 | 0.0001 | 0.0002 | 0.0004 | 0.0005 | ...
---
and so on.....
The new variable I am trying to create is dat$PD and the new "dat" should look like this:
ContractNumber | PDRating | PD_month | PD
---
1 | 7+ | 1 | 0.0005
---
1 | 7+ | 2 | 0.0005
---
1 | 7+ | 3 | 0.0005
---
2 | 6+ | 1 | 0.0002
---
2 | 6+ | 2 | 0.0002
---
and so on.....
Right now I have done it by:
# convert PD_Table to wide format
melt(PD_Table, id.vars = "PD_month")
%>% rename(c("variable" = "PDRating", "value" = "PD"))
# Merge datasets and sort
arrange(merge(dat, PD_Table, by = c("PDRating", "PD_month"),
all.x = TRUE, sort = FALSE), ContractNumber)
Well this works but feels a bit clumsy and I have to sort the rows and reorder the columns again. So basically I am looking for a smarter solution
Amongst the things I tried, which didn't work, were the following:
dat$PD <- PD_Table[dat$PD_month, dat$PDRating]
# Returns a vector with all values for a rating as PD value]
# Note PD_Table was still in wide format
dat$PD <- for (i in nrow(dat)) PD_Table[dat$PD_month[i], dat$PDRating[i]]
# Does not returns anything
dat$PD <- for (i in nrow(dat3)) PD_Table[dat$PD_month[i], which(dat3$PDRating[i] == colnames(PD_Table))]
# Does not returns anything
Any help much appreciated!
Kind regards, Matthies