8

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)

a<-read.csv('temp.csv',as.is=T)

When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...

1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312

Naftali
  • 144,921
  • 39
  • 244
  • 303
Zubin
  • 167
  • 2
  • 7

4 Answers4

11

As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

UPDATE: Here's how you can get your file into an int64 object:

# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
8

R's maximum intger value is about 2E9. As @Joshua mentions in another answer, one of the potential solutions is the int64 package.

Import the values as character instead. Then convert to type int64.

require(int64)
a <- read.csv('temp.csv', colClasses = 'character', header=FALSE)[[1]]
a <- as.int64(a)
print(a)
[1] 4031320121153001444 4113020071082679601 4073020091116779570
[4] 2081720101128577687 4041720081087539887 4011120071074301496
[7] 4021520051054304372 4082520061068996911 4082620101129165548
  • These days you can simply do `read.csv(..., colClasses=c('integer64',...))` and read it directly. (Make sure to set options('scipen'=99) so it doesn't render in scientific notation) – smci Jul 13 '16 at 11:40
  • 1
    It is worth reminding folks that the [int64 package is no longer maintained](https://cloud.r-project.org/web/packages/int64/index.html) and has been off CRAN for over 2 1/2 years. – Dirk Eddelbuettel Jul 13 '16 at 12:04
4

You simply cannot represent integers that big. See

.Machine

which on my box has

$integer.max
[1] 2147483647
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • This is very wrong, please delete or correct it. You can't represent these 64-bit integers as 32-bit integers. So you use a 64-bit integer library. – smci Jul 12 '16 at 09:45
  • That is simply not how R works which only has 32-bit integers, hence my (still correct) answer. Your wishing for 64-bit integers does not add them to the interpreter / the language. – Dirk Eddelbuettel Jul 12 '16 at 10:11
  • Dirk, that's incorrect and very rude as well. R had [packages implementing 64-bit integers](http://www.r-bloggers.com/int64-64-bit-integer-vectors-for-r/) **five years ago**. Whether the R core committee has gotten its act together on supporting them as a native type in the interpreter yet or not, is not our issue. The unqualified term 'integer' is not synonymous with '32-bit' - it could mean 64-bit, 128-bit etc. For heaven's sake, [C implemented uint64/int64 decades ago](http://stackoverflow.com/questions/6013245/are-types-like-uint32-int32-uint64-int64-defined-in-any-stdlib-header) – smci Jul 13 '16 at 11:26
  • 2
    R != R packages. I am aware of the (abandoned) `int64` and (not widely used) `bit64` packages, but they do NOT solve the problem. You plainly misunderstand what R does internally, and your references to other languages / system are (while true in a narrow sense) simply inapplicable to the question. You just do not understand how R is implemented: `integer` *exactly* means 32bit here. Making my answer still correct, and your downvote rude. For reference, also see the other answers here saying (effectively) the same thing. – Dirk Eddelbuettel Jul 13 '16 at 11:29
3

The maximum value of a 32-bit signed integer is 2,147,483,647. Your numbers are much larger.

Try importing them as floating point values instead.

There4 are a few caveats to be aware of when dealing with floating point arithmetic in R or any other language:

http://blog.revolutionanalytics.com/2009/11/floatingpoint-errors-explained.html

http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html

http://floating-point-gui.de/basic/

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • This doesn't fix the problem. Try it yourself `a <- read.csv('temp.csv', colClasses = 'numeric', header=FALSE)` then `print(a, digits=20)` still has the same results @Zubin reports. –  Jul 11 '12 at 20:58