data consistency in R: table vs ==

Question

I observe this:

> class(x)
[1] "numeric"
> str(x) 
num [1:2500] 1 1 1 1 1 1 1 1 1 1 ...
> table(x)
   1 
2500 
> table(x == 1)
FALSE  TRUE 
  299  2201 
> all.equal(x, rep(1,length(x)))
[1] TRUE
> dput(x)
c(1, ..... 1)  # all ones

how is this possible? I understand that floating point numbers should not be compared using == in general, but shouldn't table be consistent with ==?

PS. Apparently, table is consistent with all.equal and not with == because it converts its arguments to factors (i.e., strings) first.

PPS. table(x-1) shows the non-0 values.

@sds Not sure what your comment means. How about a reproducible example? — Frank, Oct 26 '15 at 16:44
@Gregor: as I said, of course this is FP accuracy - but how come `table` is not consistent with `==`? — sds, Oct 26 '15 at 16:47
Well, let's see the `dput`, `table(c(1, 1.00000000000001))` works as expected for me. — Gregor Thomas, Oct 26 '15 at 16:50
all comments so far have been addressed by my edits. thanks a lot for comments and suggestions. — sds, Oct 26 '15 at 16:53

Roland · Accepted Answer · 2015-10-26T16:58:56.223

9

Where in the documentation is it promised that they would be consistent? table expects "one or more objects which can be interpreted as factors", i.e., internally does factor(x), which turns x first into a character and then into a factor.

x <- 1 - 1e-16
x == 1
#[1] FALSE
as.character(x)
#1] "1"
factor(x) == "1"
#[1] TRUE

edited Oct 26 '15 at 16:58

answered Oct 26 '15 at 16:53

Roland

127,288
10
191
288

score 1 · Answer 2 · answered Oct 27 '15 at 14:54

Just addressing a possible misunderstanding about what all.equal does. table is not consistent with all.equal, because the latter by default includes a tolerance factor when comparing numeric values. From ?all.equal:

tolerance
numeric ≥ 0. Differences smaller than tolerance are not reported. The default value is close to 1.5e-8.

That is, all.equal should really be interpreted as meaning "all approximately equal" (to within a given limit of numerical precision).

data consistency in R: table vs ==

2 Answers2