2

Possible Duplicate:
Why are these numbers not equal?

I've come across a very bizarre situation. If I store a vector (sequence) in a text file with the following code:

fileConn<-file('test.txt')
sink(fileConn,append=T,split=T)
cat('sequence','\n')
cat(as.character(unlist(seq(0,1,0.1))),'\n')
sink()
close(fileConn)

And then load it again:

test=readLines('test.txt')

I then try and extract the same vector I've stored in the text file and compare to the original sequence using 2 "different" approaches:

sequence1=laply(strsplit(test[2]," ")[[1]],as.numeric)
sequence2=as.numeric(strsplit(test[2]," ")[[1]])

What's bizarre is that even though they look and (apparently) are the same type of vectors, R seems to think they're not!!!

cbind(seq(0,1,0.1),sequence1,sequence2)

          sequence1 sequence2
 [1,] 0.0       0.0       0.0
 [2,] 0.1       0.1       0.1
 [3,] 0.2       0.2       0.2
 [4,] 0.3       0.3       0.3
 [5,] 0.4       0.4       0.4
 [6,] 0.5       0.5       0.5
 [7,] 0.6       0.6       0.6
 [8,] 0.7       0.7       0.7
 [9,] 0.8       0.8       0.8
[10,] 0.9       0.9       0.9
[11,] 1.0       1.0       1.0

apply(cbind(seq(0,1,0.1),sequence1,sequence2),2,class)
          sequence1 sequence2 
"numeric" "numeric" "numeric"



apply(cbind(seq(0,1,0.1),sequence1,sequence2),2,nchar)
        sequence1 sequence2
 [1,] 1         1         1
 [2,] 3         3         3
 [3,] 3         3         3
 [4,] 3         3         3
 [5,] 3         3         3
 [6,] 3         3         3
 [7,] 3         3         3
 [8,] 3         3         3
 [9,] 3         3         3
[10,] 3         3         3
[11,] 1         1         1

sequence1==seq(0,1,0.1)
 [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

sequence2==seq(0,1,0.1)
 [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Does anybody have any clue why this happens and how I can prevent it from happening? Thanks very much!

Community
  • 1
  • 1
Juan
  • 121
  • 1
  • 3

2 Answers2

1

Let's step through your problem. After I dump the file, I can inspect the contents of test.txt:

test=readLines('test.txt')
> test
[1] "sequence "                               
[2] "0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 "

If we compare this to the original sequence:

> seq(0,1,0.1)
 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

We see that the file contains the correct contents. So, the error in your processing. We check both sequences if they match the dumped one:

> all.equal(sequence1, seq(0,1,0.1))
[1] TRUE
> all.equal(sequence2, seq(0,1,0.1))
[1] TRUE

So actually they are correct. So, your claim that there is a problem is false. The FALSE's are probably caused by numerical precision in representing float's, hence all.equal says they are the same.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Got it, cheers for the explanation. Am actually building a routine with a step where text is input and used for calculations. Will be much more careful when outputting it to avoid the float point trap! – Juan Nov 02 '12 at 12:22
  • 1
    Why use these kinds of text files? If you use csv format, `read.csv` can read the file much easier than that you need to parse it. If you need to store R objects, `save` is a much less error prone alternative to save R information and read it back in again using `load`. – Paul Hiemstra Nov 02 '12 at 12:30
  • Will do that actually! Cheers! – Juan Nov 02 '12 at 14:20
1

You want to do the comparison using all.equal(), which allows for some fuzz in the comparison. You can't use == as that tests for exact equivalence and that is not something you should nor want to be doing on floating point data.

> all.equal(unlist(sequence1), seq(0, 1, 0.1))
[1] TRUE
> all.equal(sequence2, seq(0, 1, 0.1))
[1] TRUE

To get the output you wanted we need to work a little harder:

> sapply(seq_along(sequence1), function(i, x, y) all.equal(x[[i]], y[i]),
+        x = sequence1, y = seq(0, 1, 0.1))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> sapply(seq_along(sequence2), function(i, x, y) all.equal(x[i], y[i]),
+        x = sequence2, y = seq(0, 1, 0.1))
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Note those last two a slightly different as sequence1 is a list.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453