I'm using R's cut function to "bucket" values, and I'm getting inconsistent results regarding how values that are equal to the boundaries of the intervals are handled.
For example, when the default right = TRUE
selection is made, the range for the bucket should be inclusive on the righthand side. For example, the range (-0.25,-0.20] would include all values that are equal to -0.20, but would not include values that are equal to -0.25. This doesn't always seem to be the case, as demonstrated by the following code and output:
df = data.frame(First = c(630,615,500,1000),
Second = c(490,492,450,990)) %>%
mutate(Change = Second/First-1)
df %<>% mutate(HistBucket = cut(Change,
seq(-0.3,0,by=0.05)))
df
The result:
First Second Change HistBucket
1 630 490 -0.2222222 (-0.25,-0.2]
2 615 492 -0.2000000 (-0.2,-0.15]
3 500 450 -0.1000000 (-0.15,-0.1]
4 1000 990 -0.0100000 (-0.05,0]
The second record has a bucketed value of exactly -0.2, but it is falling into the (-0.2,-0.15] interval instead of the desired (-.25,-0.20] interval. The third record has a bucketed value of exactly -0.1, so it falls on the endpoint of a interval as well, but it is included in the expected (-0.15,-0.10] interval.
This appears to be inconsistent behavior -- is there a way to get around this and get the cut
function to consistently treat values on the endpoints of interval ranges?