7

First of all, apologies for posting the same question again.

I want to add the grace period to a repay table for the flat interest calculation method such that the interest amount can be received for the grace period.

Dataframe1: This is the normal scenario.

uid     emi_date    amt interest    tenure  emi      Rep_seq    status  balance
KII-453 01/01/2020  100 2%          12      10.33333    1          1    113.67
KII-453 01/02/2020  100 2%          12      10.33333    2          1    103.3367
KII-453 01/03/2020  100 2%          12      10.33333    3          1    93.00333
KII-453 01/04/2020  100 2%          12      10.33333    4          0    82.67
KII-453 01/05/2020  100 2%          12      10.33333    5          0    72.33667
KII-453 01/06/2020  100 2%          12      10.33333    6          0    62.00333
KII-453 01/07/2020  100 2%          12      10.33333    7          0    51.67
KII-453 01/08/2020  100 2%          12      10.33333    8          0    41.33667
KII-453 01/09/2020  100 2%          12      10.33333    9          0    31.00333
KII-453 01/10/2020  100 2%          12      10.33333    10         0    20.67
KII-453 01/11/2020  100 2%          12      10.33333    11         0    10.33667
KII-453 01/12/2020  100 2%          12      10.33333    12         0    0.003333

From the dataframe1, I'm trying to provide the grace period for the next x months from the last date with status = 1 (In dataframe 1 this is emi_date = '01/03/2020')

Balance calculation:

  • Row 1 = 124 - 10.333 = 113.67,
  • Row 2 and onwards = balance row1 - emi = 113.67 - 10.333

The desired result should look like this:

uid     emi_date    amt interest    tenure  emi       rep_seq   status  balance
KII-453 01/01/2020  100   2%          12    10.33333    1         1     113.67
KII-453 01/02/2020  100   2%          12    10.33333    2         2     103.3367
KII-453 01/03/2020  100   2%          12    10.33333    3         3     93.00333
KII-453 01/04/2020  100   2%          12    0           4         0     95.00333
KII-453 01/05/2020  100   2%          12    0           5         0     97.04333
KII-453 01/06/2020  100   2%          12    10.33333    6         0     86.71
KII-453 01/07/2020  100   2%          12    10.33333    7         0     76.37667
KII-453 01/08/2020  100   2%          12    10.33333    8         0     66.04333
KII-453 01/09/2020  100   2%          12    10.33333    9         0     55.71
KII-453 01/10/2020  100   2%          12    10.33333    10        0     45.37667
KII-453 01/11/2020  100   2%          12    10.33333    11        0     35.04333
KII-453 01/12/2020  100   2%          12    10.33333    12        0     24.71
KII-453 01/01/2021  100   2%          12    10.33333    13        0     14.37667
KII-453 01/02/2021  100   2%          12    10.33333    14        0     4.043333
KII-453 01/03/2021  100   2%          12    4.043333    15        0     0

Balance calculation (row1) = 124 - 10.333 =113.67, row2, and onward until status(1) = balance row1 (113.67)-emi(10.333)

To add grace period, we are next two months emi as 0. And balance calculation for those two months will be for 01/04/2020 = amt (100)*interest (2%) + previous balance (93.00333) and for 01/05/2020 = (amt (100)*interest (2%)+ (amt (100)*interest (2%)*2%+ previous balance (95.00333)

The rest balance calculation will be as it is (For E.g. Previous balance - emi) until the balance is < emi, if the balance < emi we will carry forward that balance in emi to next month and keep balance 0 for that month.

P.

For the example purpose I have created the dataframe for one uid in real I have ~10000 unique uid in dataframe.

Input Dput:

    structure(list(uid = c("KII-62", "KII-62", "KII-62", 
"KII-62", "KII-62", "KII-62", "KII-62", 
"KII-62", "KII-62", "KII-62", "KII-62", 
"KII-62", "KII-62", "KII-62", "KII-62", 
"KII-62", "KII-62", "KII-62", "KII-62", 
"KII-62", "KII-62", "KII-62", "KII-62", 
"KII-62", "KII-63", "KII-63", "KII-63", 
"KII-63", "KII-63", "KII-63", "KII-63", 
"KII-63", "KII-63", "KII-63", "KII-63", 
"KII-63"), emi_date = c("05/12/2019", "05/01/2020", "05/02/2020", 
"05/03/2020", "05/04/2020", "05/05/2020", "05/06/2020", "05/07/2020", 
"05/08/2020", "05/09/2020", "05/10/2020", "05/11/2020", "05/12/2020", 
"05/01/2021", "05/02/2021", "05/03/2021", "05/04/2021", "05/05/2021", 
"05/06/2021", "05/07/2021", "05/08/2021", "05/09/2021", "05/10/2021", 
"05/11/2021", "05/12/2019", "05/01/2020", "05/02/2020", "05/03/2020", 
"05/04/2020", "05/05/2020", "05/06/2020", "05/07/2020", "05/08/2020", 
"05/09/2020", "05/10/2020", "05/11/2020"), amt = c(470000, 470000, 
470000, 470000, 470000, 470000, 470000, 470000, 470000, 470000, 
470000, 470000, 470000, 470000, 470000, 470000, 470000, 470000, 
470000, 470000, 470000, 470000, 470000, 470000, 220000, 220000, 
220000, 220000, 220000, 220000, 220000, 220000, 220000, 220000, 
220000, 220000), interest = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2), tenure = c(24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 12, 12, 
12, 12, 12, 12, 12, 12, 12, 12, 12, 12), emi = c(28983.33, 28983.33, 
28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
28983.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 
22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33), 
    Rep_seq = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
    "20", "21", "22", "23", "24", "1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "10", "11", "12"), status = c(1L, 1L, 1L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), balance = c(450416.666666667, 430833.333333333, 
    411250, 391666.666666667, 372083.333333333, 352500, 332916.666666667, 
    313333.333333333, 293750, 274166.666666667, 254583.333333333, 
    235000, 215416.666666667, 195833.333333333, 176250, 156666.666666667, 
    137083.333333333, 117500, 97916.6666666667, 78333.3333333333, 
    58750, 39166.6666666667, 19583.3333333333, 8e-28, 201666.666666667, 
    183333.333333333, 165000, 146666.666666667, 128333.333333333, 
    110000, 91666.6666666667, 73333.3333333333, 55000, 36666.6666666667, 
    18333.3333333333, 4e-28)), .Names = c("uid", "emi_date", 
"amt", "interest", "tenure", "emi", "Rep_seq", "status", "balance"
), class = "data.frame", row.names = c(NA, 36L))

Row 1 balance for uid = KII-62 would be (amt * interest * tenure)+amt, the same will repeat for Row 1 balance for uid = KII-63

Row 1 Balance (KII-62):

(4,70,000*0.02)-28983.33(emi) = 450416.666666667

library(tidyverse)
startingbalance <- 124
period <- "1 month"
dataframe1 %>% 
  mutate(index = seq(1,nrow(.))) %>%
  mutate(emi_date = dmy(emi_date)) %>%
  mutate(emi = case_when(status - lag(status) < 0 ~ 0, status - lag(status,2L) < 0 ~ 0, TRUE ~ emi)) %>%
  mutate(balance = case_when(index == 1 ~ startingbalance - emi,
                                index > 1 & emi > 0 & status == 1 ~ lag(balance) - emi,
                                index > 1 & emi == 0 & lag(status) == 1 & lag(status,2L) == 1 ~ lag(balance) + (amt * (as.integer(gsub("%","",interest))) / 100),
                                index > 1 & emi == 0 & lag(status) == 0 & lag(status,2L) == 1 ~ lag(balance,2L) + 2 * (amt * (as.integer(gsub("%","",interest))) / 100),
                                TRUE ~ NaN)) %>%
  select(-index) %>%
  do(add_row(., uid = .$uid[nrow(.)],emi_date = .$emi_date[nrow(.)] + period(period), amt = .$amt[nrow(.)],interest = .$interest[nrow(.)],tenure = .$tenure[nrow(.)],emi = .$emi[nrow(.)],status = .$status[nrow(.)],Rep_seq = .$Rep_seq[nrow(.)] + 1,balance = NaN)) %>% 
  do(add_row(., uid = .$uid[nrow(.)],emi_date = .$emi_date[nrow(.)] + period(period), amt = .$amt[nrow(.)],interest = .$interest[nrow(.)],tenure = .$tenure[nrow(.)],emi = .$emi[nrow(.)],status = .$status[nrow(.)],Rep_seq = .$Rep_seq[nrow(.)] + 1,balance = NaN)) %>%
  do(add_row(., uid = .$uid[nrow(.)],emi_date = .$emi_date[nrow(.)] + period(period), amt = .$amt[nrow(.)],interest = .$interest[nrow(.)],tenure = .$tenure[nrow(.)],emi = .$emi[nrow(.)],status = .$status[nrow(.)],Rep_seq = .$Rep_seq[nrow(.)] + 1,balance = NaN)) %>%
  mutate(balance =  {ind <- which(is.nan(balance)); for(i in ind){balance[i] <- balance[i-1] - emi[i]}; balance}) %>%
  mutate(emi = case_when(balance < 0 ~ lag(balance), TRUE ~ emi),
         balance = case_when(balance < 0 ~ 0, TRUE ~ balance))
M--
  • 25,431
  • 8
  • 61
  • 93
Sophia Wilson
  • 581
  • 3
  • 16
  • The code you provided results in this error: `Error in .$Rep_seq[nrow(.)] + 1 : non-numeric argument to binary operator`. You have to wrap it in `as.numeric` as the column `Rep_seq` is of type `character`. – SeGa Apr 06 '20 at 09:05
  • @SeGa: Please try `dataframe1[,c(3,9)] <- lapply(dataframe1[,c(3,9)], as.numeric)` before the code. – Sophia Wilson Apr 06 '20 at 09:19
  • with `3:9` it works. But besides that, the dataframe looks quite different than in your question. – SeGa Apr 06 '20 at 09:22
  • @SeGa: Yes, dataframe in question is cover the broader columns to give the context. `Input Dput` is the actual dataframe. – Sophia Wilson Apr 06 '20 at 09:40
  • @SophiaWilson your calculations indicate `Row 1 Balance (KII-62): (4,70,000*0.02*24)+4,70,000 = 6,96,500`, while your test data has a value as `450416.7` for `balance`. Can you clarify the discrepancy? Also, can you clarify how is `Status =1` different from `Status = 0, or 2 or 3`? In terms of you calculations. – sachin2014 Apr 10 '20 at 21:26
  • @sachin2014: My bad, the calculation was wrong. Updated the question with correct calculation. – Sophia Wilson Apr 11 '20 at 01:02
  • @SophiaWilson can you describe what you mean by getting a negative value at the end of the 2-3 months, if possible show the Output? Also should the `startingbalance` = `479400`? The value you have used in your test code `124` will give a negative value for `balance` for row 1: 124 -28983.33 = -28859.33? Or am I missing something? – sachin2014 Apr 11 '20 at 15:04
  • @SophiaWilson actually it will be helpful if you can share the desired output using your test data. – sachin2014 Apr 11 '20 at 15:37

1 Answers1

0
library(tidyverse)

df <-  tibble( # the dataframe, as a tibble, with a few unnecessary columns removed for clarity
  uid = c("KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-62", "KII-62", "KII-62", 
    "KII-62", "KII-63", "KII-63", "KII-63", 
    "KII-63", "KII-63", "KII-63", "KII-63", 
    "KII-63", "KII-63", "KII-63", "KII-63", 
    "KII-63"), 
  emi_date =  as.Date(c("05/12/2019", "05/01/2020", "05/02/2020", 
    "05/03/2020", "05/04/2020", "05/05/2020", "05/06/2020", "05/07/2020", 
    "05/08/2020", "05/09/2020", "05/10/2020", "05/11/2020", "05/12/2020", 
    "05/01/2021", "05/02/2021", "05/03/2021", "05/04/2021", "05/05/2021", 
    "05/06/2021", "05/07/2021", "05/08/2021", "05/09/2021", "05/10/2021", 
    "05/11/2021", "05/12/2019", "05/01/2020", "05/02/2020", "05/03/2020", 
    "05/04/2020", "05/05/2020", "05/06/2020", "05/07/2020", "05/08/2020", 
    "05/09/2020", "05/10/2020", "05/11/2020"), format = "%d/%m/%Y"),
  amt = c(470000, 470000, 
    470000, 470000, 470000, 470000, 470000, 470000, 470000, 470000, 
    470000, 470000, 470000, 470000, 470000, 470000, 470000, 470000, 
    470000, 470000, 470000, 470000, 470000, 470000, 220000, 220000, 
    220000, 220000, 220000, 220000, 220000, 220000, 220000, 220000, 
    220000, 220000), 
    interest = rep(2, 36),
    emi = c(28983.33, 28983.33, 
    28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
    28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
    28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 28983.33, 
    28983.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 
    22733.33, 22733.33, 22733.33, 22733.33, 22733.33, 22733.33), 
    grace_period = c(1L, 1L, 1L, 
        1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
        0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
        0L, 0L, 0L))

# split the data into two dataframes by uid (keeping different user data together makes it very difficult to use)
df1 <- df %>% filter(uid == "KII-62")
df2 <- df %>% filter(uid == "KII-63")

# create a function to calculate the balance and clean the tibble:
clean_df <- function(df) {
  df <- df %>% 
    mutate(
      emi = ifelse(grace_period == 1, 0, emi), # if grace period, pay back nothing
      net_change = (amt * interest / 100) - emi,
      month = row_number()
    )
# create a row which is the initial payment at month 0
  first_row <- df %>%  
    slice(1) %>% 
    mutate(
      net_change = amt,
      month = 0
    )

  df <- df %>% 
    add_row(first_row, .before = 1) %>% # join the first row and the rest of the tibble together
    select(-amt, -interest, -grace_period, -emi) %>% # remove columns we've used and now have no need for
    mutate(balance = cumsum(net_change))

  return(df)
}

df1 <- clean_df(df1)
df2 <- clean_df(df2)
Mark
  • 7,785
  • 2
  • 14
  • 34