0

I have certain loan ID origination data in one data file and another data file measure the performance of these loan ID's over the time period.

Now I want to know the loan ID which default over a certain time period. Lets say time period of 12 quarters.

Example:

Origination file
Loan_ID loan_terms loan_amount Date
1         360       1000     01/01/2010
2         240       2000     05/15/2010
3         150       3000     10/01/2010 

Perf

      Loan_ID Delinquency_Status Date
1       Active             01/01/2012  
1       Active             01/03/2012
1       Default            01/06/2012
2       Active             01/01/2013
2       Active             06/01/2013
2       Prepayment         09/01/2013 
3       Active             12/30/2013

Since I am evaluating the performance over the 12 quarter only from 01/01/2010. I want the following:

Loan_ID Delinquency_Status 
1        Default
2        Active   
3        Active
  • For us to help you, please provide a minimal reproducible dataset e.g. see the first answer here on how to do this. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – meenaparam Jul 18 '17 at 12:03

1 Answers1

0

A good start for merging two data frames is merge(), with by.x and by.y specifying the key pair.

To see if it works in your case you would need to elaborate a bit on your question.

Jindra Lacko
  • 7,814
  • 3
  • 22
  • 44
  • I need to find the Delinquency_Status within a time period. In the performance file, I am measuring the performance of loans ids, thus these ids are repeating. I want to bring the performance of loans and origination information together. – Nawal Kashyap Jul 18 '17 at 12:30
  • If you are certain you have only a single deliquency_status value within the period you can first filter (e.g. by subsetting) the status data frame and then merge it with origination info. If there are multiple statuses within the period you first have to decide which one you want to join to origination data - the best? the worst? – Jindra Lacko Jul 18 '17 at 12:40
  • I want to add the worst status within the timeframe – Nawal Kashyap Jul 18 '17 at 12:45
  • the more complicated variant :) This is easiest done with `dplyr` package, have look over here https://stackoverflow.com/questions/24237399/how-to-select-the-rows-with-maximum-values-in-each-group-with-dplyr – Jindra Lacko Jul 18 '17 at 13:36
  • Sir, I think I am not able to explain you my problem properly. I have around 50K loans origination file which have around 26 variable related to characteristics of the borrowers. In the performance files, I have variables related to the performance of each loan over the time period till 2016. So each loan id can be repeat more than 10-20 times. I want to extract the information of loan id's with in a time frame. Suppose 3 Years from the origination date. How can I code it in R? – Nawal Kashyap Jul 20 '17 at 11:32
  • On the basis of entire data, I have created the default, However I want to check in the time-frame, how many default or not? I have below code for the same, however it is not showing the correct results. model_data["Del_status"] <- apply(model_data, 1, function(x) max(as.numeric(perf[ (perf$Loan_ID==x["Loan_ID"]) & (perf$`2` <= model_data[(model_data$Loan_ID==x["Loan_ID"]),"exp_life"]), "Default" ])) == 1 Please correct me where I am making mistake – Nawal Kashyap Jul 20 '17 at 11:39