1

I have to create a new column called SalaryX, and the values for the column is calculated as follows:

If the original salary is between 20,000 and 30,000, its SalaryX should be 20,000; If the original salary is between 30,000 and 40,000, its SalaryX should be 30,000 and so on.

I tried using the cut function as follows:

cut(employee$salary, 5, include.lowest = T, labels = c("20000", "30000", "40000", "50000", "60000"))

But what that does is if the salary value is 25600, the SalaryX will be calculated as 30000.

Is there another way to do this?

PavoDive
  • 6,322
  • 2
  • 29
  • 55
Adarsh Ravi
  • 893
  • 1
  • 16
  • 39

3 Answers3

4

Assuming all the breaks are 10000 apart, a much more efficient solution is

salary <- c(10000, 12000, 29000, 30000, 35000, 39000, 51000)

floor(salary/10000) * 10000
# [1] 10000 10000 20000 30000 30000 30000 50000
AkselA
  • 8,153
  • 2
  • 21
  • 34
1

You need to manually specify the breaks.

cut(employee$salary,breaks = c(20000, 30000, 40000, 50000, 60000, 70000) , include.lowest = T, labels= c("20000", "30000", "40000", "50000", "60000"))

From the documentation:

breaks: either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.

This means R will automatically decide the cut points based on the input if you only specify a number, but if you give the breaks manually you will get the levels you want.

Adarsh Ravi
  • 893
  • 1
  • 16
  • 39
0

Here's a dplyr solution using case_when() with between()

employee %>%
  mutate(new_salary = case_when(
    between(salary, 10000, 20000) ~ 10000,
    between(salary, 20000, 30000) ~ 20000,
    between(salary, 30000, 40000) ~ 30000,
    between(salary, 40000, 50000) ~ 40000,
    between(salary, 50000, 60000) ~ 50000,
    between(salary, 60000, 70000) ~ 60000
  ))
Tyler Burleigh
  • 537
  • 4
  • 12