1

I am attempting to create an additional column in my dataset that bucketizes percentiles. Ideally I would to create logic as follows:

CASE 
  WHEN  percentile >=  75 AND  percentile < 90  THEN "75%-89% Percentile"
  WHEN percentile >=  50 AND  percentile < 75  THEN "50%-75% Percentile"

END 

What I have attempted dplyr is the following:

  mutate(Bucket = case_when(as.double(percentile) >= 90 ~ "90%-100% Percentile",
                            as.double(percentile) >=  75 & as.double(percentile) < 90  ~ "75%-89% Percentile",
                            as.double(percentile) <  75 & as.double(percentile) >= 50  ~ "50%-75% Percentile",
                            as.double(percentile) <  50 & as.double(percentile) >= 25  ~ "25%-50% Percentile",
                            as.double(percentile) <  25 & as.double(percentile) >= 0  ~ "0%-25% Percentile"))

However it is not bucketizing correctly, see a sample of the results in the screenshot. The bucket flag for these percentiles should be "75%-89% Percentile" :

enter image description here

Jaskeil
  • 1,044
  • 12
  • 33
  • Why use `case_when()`? Why not just `cut(percentile, 25, 50, 75, 90, ...)`? Or try (my) `santoku::chop()` for an easier interface. – dash2 May 04 '21 at 20:13

1 Answers1

3

The column percentile is factor. We need to convert to character class first and then to numeric

library(dplyr)
 df1 %>%
     mutate(percentile = as.numeric(as.character(percentile))) %>%
     ...

What happens is that when we directly coerce to numeric/integer, it gets coerced to integer storage values instead of the actual values

v1 <- factor(c(81.9, 82.7, 81.9, 82.5))
as.numeric(v1)
#[1] 1 3 1 2

is different than the following

as.numeric(as.character(v1))
#[1] 81.9 82.7 81.9 82.5

Or probably faster with levels

as.numeric(levels(v1)[v1])
#[1] 81.9 82.7 81.9 82.5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    You might also want to check this out as well: https://stackoverflow.com/questions/21714457/is-cut-style-binning-available-in-dplyr – TheSciGuy May 04 '21 at 19:21