Scientific notation issue in R

Question

I have an ID variable with 20 digits. Once i read the data in R , it changes to Scientific notation and then if i write the same id to csv file, the value of ID changes.

For example , running the below code should print me the value of x as "12345678912345678912",but it prints "12345678912345679872":

Code:

options(scipen=999)

x <- 12345678912345678912

print(x)

Output:

[1] 12345678912345679872

My questions are :

1) Why it is happening ?

2) How to fix this problem ?

I know it has to do with the storage of data types in R but still i think there should be some way to deal with this problem. I hope i am clear with this question.

I don't know if this question was asked or not in so point me to a link if its a duplicate.I will remove this post

I have gone through this, so i can relate with the issue of mine, but i am unable to fix it.

Any help would be highly appreciated. Thanks

Thanks for replying,The problem persists, if i use as.character(x) , the value of x is again "12345678912345679872" — PKumar, Jan 13 '15 at 10:07
I meant to format it "previously", like when you import your data, you can specify character colClasses for ID variable (so kind of doing x<-"12345678912345678912"). Would this work ? — Cath, Jan 13 '15 at 10:08
else, you maybe can specify a larger number of digits with `options(digits=30)` for example ? — Cath, Jan 13 '15 at 10:11
The number is to big to be represented as an integer. Thus, it is represented as a double, which leads to [issues with floating point number accuracy](http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal). There are [possibilities to use big integers](http://stackoverflow.com/questions/2053397/long-bigint-decimal-equivalent-datatype-in-r) in R, but since your numbers are ids you should follow CathG's advice and treat them as character strings. — Roland, Jan 13 '15 at 10:15
@CathG Yes it works, This is how i am doing it now: read.csv("file.csv",colClasses=c("character",rep(NULL,1))) as i have only two columns (ID and value). Thanks , By the way you can put your thought as answer, I would love to accept your answer. — PKumar, Jan 13 '15 at 10:19
ok great. I'm guessing you're second column is numeric ? so you can rather do colClasses=c("character","numeric") (by the way, no need to use `rep` if you're repeating just once ;-) ) — Cath, Jan 13 '15 at 10:34
Thanks @CathG , you can put your thoughts as an answer, It would be helpful to everyone. — PKumar, Jan 13 '15 at 10:37

Anders Ellern Bilgrau · Answer 1 · 2015-01-13T10:50:30.223

3

R does not by default handle integers numerically larger than 2147483647L.

If you append an L to your number (to tell R its an integer), you get:

x <- 12345678912345678912L
#Warning message:
#non-integer value 12345678912345678912L qualified with L; using numeric value

This also explains the change of the last digits as R stores the number as a double.

I think the gmp-package should be able to handle large numbers in general. You should therefore either accept the loss of precision, store them as character stings, or use a data-type from the gmp package.

edited Jan 13 '15 at 10:50

answered Jan 13 '15 at 10:42

Anders Ellern Bilgrau

9,928
1
30
37

Thanks for sharing your views and knowhow of gmp package. – PKumar Jan 13 '15 at 10:45

score 1 · Accepted Answer · answered Jan 13 '15 at 10:42

To circumvent the problem due to number storing/representation, you can import your ID variable directly as character with the option colClasses, for example, if using read.csv and importing a data.frame with the ÌD column and another numeric column:

mydata<-read.csv("file.csv",colClasses=c("character","numeric"),...)

score 1 · Answer 3 · answered Jul 31 '21 at 16:45

1

Using readr you can do

mydata <- readr::read_csv("file.csv", col_types = list(ID=col_character()))

where "ID" is the name of your ID column

answered Jul 31 '21 at 16:45

MrFlick

195,160
17
277
295

Scientific notation issue in R

3 Answers3

Linked

Related