15

I am trying to read a CSV file that has barcodes in the first column, but when R gets it into a data.frame, it converts 1665535004661 to 1.67E+12.

Is there a way to preserve this number in an integer format? I tried assigning a class of "double", but that didn’t work, nor did assigning a class of "character". Once it is in the 1.67E+12 format any attempt to convert it back to an integer returns 167000000000.

zx8754
  • 52,746
  • 12
  • 114
  • 209
James
  • 1,447
  • 3
  • 16
  • 30

8 Answers8

18

It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there.

x <- 1665535004661
> x
[1] 1.665535e+12
> print(x, digits = 16)
[1] 1665535004661

See, the numbers were there all along. They don't get lost unless you have a really large number of digits. Sorting on what you brought in will work fine and you can just explicitly call print() with the digits option to see your data.frame instead of implicitly by typing the name.

John
  • 23,360
  • 7
  • 57
  • 83
  • 1
    This essentially overrides the method I discuss below with `options()`. As a point of reference, one should read and heed the warning in `?print.default` as the implementation at >= 16 digits starts to become a platform specific issue as the implementation of `sprintf()` begins to differ based on the underlying C code. – Chase May 23 '12 at 03:29
  • The same is true if you use options(). It's only an output default. I think it would be best if you were explicit about that in your answer. As it is that's rather vague. Trying to read it naive I'm wondering if the digits option just changes how many digits are retained, how they're read in?... what? – John May 23 '12 at 05:30
  • Good point - edited my answer to be more explicit about that. Feel free to tweak further if you think necessary. Cheers! - Chase – Chase May 24 '12 at 03:42
15

Picking up on what you said in the comments, you can directly import the text as a character by specifying the colClasses in read.table(). For example:

num <- "1665535004661"
dat.char <- read.table(text = num, colClasses="character")
str(dat.char)
#------
'data.frame':   1 obs. of  1 variable:
 $ V1: chr "1665535004661"
dat.char
#------
             V1
1 1665535004661

Alternatively (and for other uses), you can specify the digits variable under options(). The default is 7 digits and the acceptable range is 1-22. To be clear, setting this option in no way changes or alters the underlying data, it merely controls how it is displayed on screen when printed. From the help page for ?options:

controls the number of digits to print when printing numeric values. It is a suggestion only.
Valid values are 1...22 with default 7. See the note in print.default about values greater than
15.

Example illustrating this:

options(digits = 7)
dat<- read.table(text = num)

dat
#------
            V1
1 1.665535e+12

options(digits = 22)
dat
#------
             V1
1 1665535004661

To flesh this out completely and to account for the cases when setting a global setting is not preferable, you can specify digits directly as an argument to print(foo, digits = bar). You can read more about this under ?print.default. This is what John describes in his answer so credit should go to him for illuminating that nuance.

Chase
  • 67,710
  • 18
  • 144
  • 161
10

try working with colClasses="character"

read.csv("file.csv", colClasses = "character")

http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

Have a look at this link.

rockswap
  • 623
  • 1
  • 7
  • 17
6

From the ?is.integer page:

"Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9?

1665535004661L > 2*10^9 [1] TRUE

You want package Rmpfr.

library(Rmpfr)
x <- mpfr(15, precBits= 1024)
IRTFM
  • 258,963
  • 21
  • 364
  • 487
5

You can use the numerals arguments when you are doing read.csv. So for example:

read.csv(x, sep = ";", numerals = c("no.loss")) Where x is your data.

This preserves the value of the long integers and doesn't mess with their representation when you import the data.

m00am
  • 5,910
  • 11
  • 53
  • 69
Aditi Kumar
  • 51
  • 1
  • 1
4

Take a look at the int64 package: Bringing 64-bit data to R.

Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • Is there a way to just import it as a character? I don't need to do math with it, I just need to sort on it. – James May 23 '12 at 00:17
3

Since you are not performing arithmetic on this value, character is appropriate. You can use the colClasses argument to set various classes for each column, which is probably better than using all character.

data.csv:

a,b,c
1001002003003004,2,3

Read character, then integers:

x <- read.csv('test.csv',colClasses=c('character','integer','integer'))
x
                 a b c
1 1001002003003004 2 3


mode(x$a)
[1] "character"

mode(x$b)
[1] "numeric"
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
0

I tend to use options(scipen = 9999999999) at the start of every script. Outputs numbers to large number of decimal places instead of scientific format. You can change the number of '9's to however many decimals to display. There's a way to set this in global options, but I'm not 100% sure how.

hanm
  • 9
  • 1