Why doesn't cut work as expected in R?

Question

Why don't these two return the same result?

    D = data.frame( x=c( 0.6 ) )

    D$binned = cut( D$x, seq( 0.50,0.70,0.025 ), include.lowest=TRUE, right=FALSE )
    D # 0.6 is binned correctly as [0.6,0.625)

    D$binned = cut( D$x, seq( 0.55,0.65,0.025 ), include.lowest=TRUE, right=FALSE )
    D # 0.6 is binned incorrectly as [0.575,0.6)

Variant on http://stackoverflow.com/q/9508518/892313 – Brian Diggs Jul 12 '13 at 18:51 — Brian Diggs, Jul 12 '13 at 18:51

James · Answer 1 · 2013-07-12T14:45:44.870

5

Representation error. Floating point approximation of numbers is only exact if the number is a combination of certain powers of 2. Other numbers are mapped to these numbers. Different algorithms to produce a number may do so in different ways and have different errors associated with them (ie above or below the expected value). In this case:

print(D$x,digits=22)
[1] 0.5999999999999999777955
print(seq(0.5,0.7,0.025)[5],digits=22)
[1] 0.5999999999999999777955
> print(seq(0.55,0.65,0.025)[3],digits=22)
[1] 0.6000000000000000888178

edited Jul 12 '13 at 14:45

answered Jul 12 '13 at 14:39

James

65,548
14
155
193

Unfortunately, not really. The errors are consistent, but ultimately the value depends on how it is calculated. The usual way of dealing with this is to only consider equality within a certain tolerance, however `cut` needs sharp break points. – James Jul 12 '13 at 15:00
However, if your numbers will always only have a few decimal points, you could bump the break points accordingly (eg, `seq( 0.55,0.65,0.025 ) - 0.000001` and see if that helps. – Aaron left Stack Overflow Jul 12 '13 at 15:16
2

You might want to look at the source code for `hist.default` to see one approach – hadley Jul 13 '13 at 07:13

score 1 · Accepted Answer · answered Jul 13 '13 at 08:41

1

D$binned = cut( D$x, round(seq( 0.55,0.65,0.025 ),3), include.lowest=TRUE, right=FALSE )

D

x binned

1 0.6 [0.6,0.625)

answered Jul 13 '13 at 08:41

Fabio Marroni

423
8
19

Why doesn't cut work as expected in R?

2 Answers2