0

I'm using R's cut function to "bucket" values, and I'm getting inconsistent results regarding how values that are equal to the boundaries of the intervals are handled.

For example, when the default right = TRUE selection is made, the range for the bucket should be inclusive on the righthand side. For example, the range (-0.25,-0.20] would include all values that are equal to -0.20, but would not include values that are equal to -0.25. This doesn't always seem to be the case, as demonstrated by the following code and output:

df = data.frame(First = c(630,615,500,1000),
                Second = c(490,492,450,990)) %>% 
  mutate(Change = Second/First-1)
df %<>% mutate(HistBucket = cut(Change,
                                seq(-0.3,0,by=0.05)))
df

The result:

  First Second     Change   HistBucket
1   630    490 -0.2222222 (-0.25,-0.2]
2   615    492 -0.2000000 (-0.2,-0.15]
3   500    450 -0.1000000 (-0.15,-0.1]
4  1000    990 -0.0100000    (-0.05,0]

The second record has a bucketed value of exactly -0.2, but it is falling into the (-0.2,-0.15] interval instead of the desired (-.25,-0.20] interval. The third record has a bucketed value of exactly -0.1, so it falls on the endpoint of a interval as well, but it is included in the expected (-0.15,-0.10] interval.

This appears to be inconsistent behavior -- is there a way to get around this and get the cut function to consistently treat values on the endpoints of interval ranges?

Kyle Wurtz
  • 171
  • 1
  • 1
  • 8
  • R-FAQ 7.31 Strikes Again! – Gregor Thomas Nov 18 '15 at 21:56
  • 1
    And, in case it's not clear from the link, the problem is that when you say "exactly -0.2", that's not correct; it's **not exact**. If you look at `df$Change[2] - (-0.2)` you'll see the difference. A relatively easy solution for this use case would be to round your column, something like `round(Change, digits = 5)` – Gregor Thomas Nov 18 '15 at 22:08

0 Answers0