0

I am trying to categorize variables in a column based on loans. If the loan is fully paid then it should labeled as good, if default or charged off then labeled as bad. However when i run the code below in R, i get this error:

Error: Problem with `mutate()` input `new_status`. x must be a character vector, not a logical vector. ℹ Input `new_status` is `case_when(...)`.

Here is the code block

loans <- loansdf %>% mutate(new_status = case_when( 
status %in% c("Fully paid") ~ "Good", 
status %in% c("Default", "Charged off") ~ "Bad",
TRUE ~ NA))
Feverish123
  • 105
  • 1
  • 7
  • 1
    please include a sample of your data to help us diagnose. most useful would be to include the code produced by `dput(head(loansdf))` in your question. – Jon Spring Mar 10 '21 at 00:17
  • 2
    Your last line needs to read `TRUE ~ NA_character_` in order to return the same type as the other two conditions. – Calum You Mar 10 '21 at 00:20
  • 1
    TRUE ~ NA_character_ this gets rid of the error but does not classify fully paid loans as good and default, charged off loans as bad – Feverish123 Mar 10 '21 at 00:24
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 10 '21 at 00:26
  • I am not able to include the dput(head) because the data consists of over 30,000 entries – Feverish123 Mar 10 '21 at 00:27
  • head() means that you will only return the first 5 rows. What do you mean by "does not classify"? – Calum You Mar 10 '21 at 00:29
  • I mean I get the new_status column but the values are all N/A – Feverish123 Mar 10 '21 at 00:46
  • I took a look at the `csv` file. The issue is spelling. The values in the `case_when` did not match the spelling of the values in the `status` variable. I was able to work with the OP in chat and have since updated my answer which he accepted. The edit queue is full, so I will let him know the code to properly subset the data of which there are 50K rows and 32 columns. – Eric Mar 10 '21 at 01:17

1 Answers1

2

According to the case_when documentation:

# All RHS values need to be of the same type. Inconsistent types will throw an error.
# This applies also to NA values used in RHS: NA is logical, use
# typed values like NA_real_, NA_complex, NA_character_, NA_integer_ as appropriate.

In this case, and IMO, I would choose to use the logical operator == instead of value matching with %in%.

loansdf <- data.frame(
  name = c("Eric Fletcher", "Hadley Smith", "Homer Simpson", "Pauline Tator Tots"),
  status = c("Fully Paid", "Default", "Charged Off", "Test")
)

library(dplyr)

loansdf %>% 
  mutate(
    new_status = case_when(
      status =="Fully Paid" ~ "Good",
      status == "Default" | status == "Charged Off" ~ "Bad",
      TRUE ~ as.character(NA)
    )
  )

#>                 name      status new_status
#> 1      Eric Fletcher  Fully Paid       Good
#> 2       Hadley Smith     Default        Bad
#> 3      Homer Simpson Charged Off        Bad
#> 4 Pauline Tator Tots        Test       <NA>

Created on 2021-03-09 by the reprex package (v0.3.0)

Eric
  • 2,699
  • 5
  • 17
  • I have the new_status column but all the values are N/A – Feverish123 Mar 10 '21 at 00:35
  • Please perform the following as posted by Jon Spring in the comments: please include a sample of your data to help us diagnose. most useful would be to include the code produced by `dput(head(loansdf))` in your question. – Eric Mar 10 '21 at 00:40
  • Can I post a screenshot of the head(loansdf)? – Feverish123 Mar 10 '21 at 00:44
  • No. The point of what we are asking is so that it makes it easier for us (answerers) to help you by being able to reproduce your data on our end. Also, did you run my code on your computer? You say your values are all `NA`...does that mean there all `NA` after you ran my code or your code? – Eric Mar 10 '21 at 00:46
  • Type this `dput(head(loansdf))` in your IDE and edit your question to include the output. – Eric Mar 10 '21 at 00:47
  • Yes they are all NA after running the code you posted. the new_status column has NA values – Feverish123 Mar 10 '21 at 00:48
  • There is some issues with your data. Is the `status` variable of character type? Are the values in the `status` variable` spelled the same as they are listed in the code I provided (`Fully paid`, `Default`, `Charged off``)? – Eric Mar 10 '21 at 00:52
  • The character limit is exceeded if i post it here. Is there a way that i can share the .csv file? – Feverish123 Mar 10 '21 at 00:53
  • Join this chat room https://chat.stackoverflow.com/rooms/229705/help – Eric Mar 10 '21 at 00:56