Creating a dummy variable in R using loan default data

Question

I'm working with Lending Club data set and I'm trying to create a dummy variable for the target variable loan_status. So my main goal is for Charged Off to be 0 and Fully Paid to be 1 and all else would be 'NA'. The variable loan status has several values: Current, Fully Paid, Late, Grace Period, Delinquent, Charged off, and Does not qualify due to credit profile. I just want to focus on Charged Off and Fully Paid. I've tried numerous times but still no success. For example:

Creating a new target variable

loan_status1 <- if(loan_status== 'Fully Paid'){'Yes'} else if
 (loan_status== 'Charged Off') {'No'} else 'NA'

Also I've tried this:

if(loan_status=='Fully Paid'){
   0} else if (loan_status=='Charged Off') {
   1} else (loan_status=='NA')

I would appreciate any guidance.

The most simple would be using vectorized `ifelse`, Try `loan_status1 <-ifelse(loan_status == 'Fully Paid', 1, ifelse(loan_status == 'Charged Off', 0, NA))` — Ronak Shah, Mar 23 '17 at 04:18
Possible duplicate of [Nested ifelse statement in R](http://stackoverflow.com/questions/18012222/nested-ifelse-statement-in-r) — Ronak Shah, Mar 23 '17 at 04:22

score 0 · Answer 1 · edited Mar 24 '17 at 07:36

0

Basically you could try to run a for-loop over your data by executing this: Don't set NAs as strings ('NA'), better set to data type NA

loan_status <- sample(rep(c('Fully Paid', 'Charged Off', "abc"), 100), 100, replace = FALSE)

for (i in seq_along(loan_status)){
  if (loan_status[i] == 'Fully Paid'){
    loan_status[i] <- as.integer(0)
  } else if (loan_status[i] == 'Charged Off'){
    loan_status[i] <- as.integer(1)
  } else {
    loan_status[i] == NA
  }
}

Maybe you want to do this the easy way with the factor() function:

For instance you could do:

factor(loan_status, levels = c('Fully Paid', 'Charged Off'), labels = c(0, 1))

edited Mar 24 '17 at 07:36

Uwe

41,420
11
90
134

answered Mar 23 '17 at 15:36

Marcel Der

171
1
4

I would have upvoted your answer for the `factor` approach. But the `for` loop is a no go. It's a clumsy and slow re-implementation of [Ronak Shah's vectorized `ifelse` approach](http://stackoverflow.com/questions/42967160/creating-a-dummy-variable-in-r-using-loan-default-data#comment73027898_42967160): `loan_status1 <- ifelse(loan_status == 'Fully Paid', 1, ifelse(loan_status == 'Charged Off', 0, NA))` – Uwe Mar 24 '17 at 07:34
Oops, the for loop doesn't work at all. It returns "0"and "1" as character and leaves the "abc" unchanged. Reason: `loan_status[i] == NA` should read `loan_status[i] <- NA`. – Uwe Mar 24 '17 at 08:16

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

The OP requested a 1:1 replacement, i.e., only one data field involved, of selected values. Besides the nested ifelse approach, this could be done by using factors or join for larger data.

If more than two or three values need to be replaced, the "hard-coded" nested ifelse approach easily gets unhandy.

Factor case 1: Yes, No

# create some data
loan_status <- c("Fully Paid", "Charged Off", "Something", "Else")
# do the conversion
factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No"))
#[1] Yes  No   <NA> <NA>
#Levels: Yes No

Or,

as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("Yes", "No")))
#[1] "Yes" "No"  NA    NA

if the result is expected as character.

Factor case 2: 0L, 1L as integers

If the result is expected to be of type integer, the factor approach can still be used but needs additonal conversion.

as.integer(as.character(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1"))))
#[1]  0  1 NA NA

Note, that the conversion to character is essential here. Otherwise, the result would return the numbers of the factor levels:

as.integer(factor(loan_status, levels = c("Fully Paid", "Charged Off"), labels = c("0", "1")))
#[1]  1  2 NA NA

Join

In case of larger data and many items to be replaced using data.table join might be an alternative worth considering:

library(data.table)
# create translation table
translation_map <- data.table(
  loan_status = c("Fully Paid", "Charged Off"),
  target = c(0L, 1L))
# create some user data
DT <- data.table(id = LETTERS[1:4],
                 loan_status = c("Fully Paid", "Charged Off", "Something", "Else"))
DT
#   id loan_status
#1:  A  Fully Paid
#2:  B Charged Off
#3:  C   Something
#4:  D        Else

# right join
translation_map[DT, on = "loan_status"]
#   loan_status target id
#1:  Fully Paid      0  A
#2: Charged Off      1  B
#3:   Something     NA  C
#4:        Else     NA  D

By default (nomatch = NA), data.table does a right join, i.e, takes all rows of DT.

Creating a dummy variable in R using loan default data

2 Answers2

Factor case 1: Yes, No

Factor case 2: 0L, 1L as integers

Join