0

I see a similar post-https://stackoverflow.com/questions/6104836/splitting-a-continuous-variable-into-equal-sized-groups. But my problem here is that, the range is required to be a String. below is my dataframe

df
name salary bonus increment(%)
AK   22200  120   2
BK   55000   34   .1
JK   12000  400   3
VK   3400   350   15
DK   5699    NA    NA

df = structure(list(name = c("AK", "BK", "JK", "VK", "DK"), salary = c(22200L, 55000L, 12000L, 3400L, 5699L), bonus = c(120L, 34L, 400L, 350L, NA), `increment(%)` = c(2, 0.1, 3, 15, NA)), .Names = c("name", "salary", "bonus", "increment(%)"), row.names = c(NA, -5L), class = "data.frame")

the salary column needs to be modified with a range like "< 10K", "10K-20K", "20K-30K" "> 30K" these values are alphanumeric which is not addressed in cut by defined interval

name salary  bonus increment(%)
AK   20K-30K  120    2
BK   >30K      34   .1
JK   10K-20K  400    3
VK   <10K     350    15
DK   <10K      NA    NA

however, after using cut by r defined interval is not yielding the desired result, below is the code df$salary<-cut(df$salary,breaks = c(0,10000,20000,30000,60000),include.lowest = TRUE)

output is

  name        salary bonus increment(%)
  1   AK (2e+04,3e+04]   120          2.0
  2   BK (3e+04,6e+04]    34          0.1
  3   JK (1e+04,2e+04]   400          3.0
  4   VK     [0,1e+04]   350         15.0
  5   DK     [0,1e+04]    NA           NA
anmonu
  • 169
  • 1
  • 13
  • Guys this is not a duplicate question, the range here is essentially an alphanumeric value. – anmonu Jun 24 '17 at 07:33

1 Answers1

1

You can use the case_when function from the dplyr package. df2 is the final output.

library(dplyr)

df2 <- df %>%
  mutate(salary = case_when(
    salary < 10000                   ~ "<10K",
    salary >= 10000 & salary < 20000 ~ "10K-20K",
    salary >= 20000 & salary < 30000 ~ "20K-30K",
    salary >= 30000                  ~ ">30K",
    TRUE                             ~ "NA"
  ))
www
  • 38,575
  • 12
  • 48
  • 84
  • Thanks, case_when seemed very apt for this scenario. Just a twik that used .$ infornt of salary in the case_when as in My R studio it was throwing an error as not able to find salary variable. – anmonu Jun 25 '17 at 11:24