0

I'm newer to R, so this may be a silly mistake. I'm trying to use the cut function, but I keep getting the same error. Error is:

Error: Problem with `mutate()` input `Calls_bucket`.
x 'breaks' are not unique
i Input `Calls_bucket` is `cut(...)

Here's my code (I've tried many different variations. Here are two most recent):

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,c(2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour, na.rm=T)),
                         labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20")))

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,breaks=c(2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour, na.rm=T)),labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20")))

I can get it to work if I simply pick the number of breaks, but I want to define them specifically. this code works, for example:

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,12))

thanks in advance. any help would be greatly appreciated.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Eric
  • 11
  • 1
  • What is `max(Calls_per_Hour, na.rm=T)`? Is it equal to one of your existing breaks? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 03 '20 at 20:30
  • 2
    instead of `max` use `Inf` – Onyambu Sep 03 '20 at 20:35

2 Answers2

1

While defining the breaks, use unique() if you are using max(Calls_per_Hour). This worked for me

m3 <- m2 %>%
    mutate(Calls_bucket=cut(Calls_per_Hour,unique(c(0,2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour,na.rm=TRUE))),
                            labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20"),include.lowest = T))
  • unique() ensures a unique vector of cuts i.e. if max(Calls_per_Hour) is equal to a value from your given vector, the cuts remain unique.
  • Since you are using 0 to start your labels, you should also include 0 in your cuts.
  • Setting include.lowest=TRUE ensures that the lowest value encountered is assigned a label.
Hasan Bhagat
  • 395
  • 1
  • 8
0

For me it worked when I included 0 as the first cut. And specified include.lowest = TRUE.Therefore every 0 gets included in the first category. By setting include.lowest to FALSE (which is the default) 0 will be transformed to NA.

m2 <- data.frame(Calls_per_Hour = 0:25)

m3 <- m2 %>%
  mutate(Calls_bucket=cut(Calls_per_Hour,c(0,2,4,6,8,10,12,14,16,18,20, Inf),
                          labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20"),
                          include.lowest = TRUE))

One annotation. In your example the labels are not clear. If I read 0-2 and 2-4 I wouldn't know where to include the 2. So in your actual code you may set the labels unambiguously (0-2, 3-4).

tamtam
  • 3,541
  • 1
  • 7
  • 21