2

I have a csv with very long (273 digits) set of binary numbers. When I import in R it changes it into scientific notation which is no good for the package I am using (the RMark package).

I need to exact string of numbers. When I do options(scipen=999) it changes the string from;

00001000100000100101000100001000000000100000010000000100000000000010000000010010000100000000010000000000000000010110000000000000000000000000000000000100001000000000000000000010011110100010000000000001010011000000000000000000000000000000000000010010000100010001101000000000

to

1000100000100101025444402400842264066484822600084686488860622084686800026026866466620228640024800662200404662066606842460420682026606666882002660642200422860660644222248664084866606040062688484466484868460660204024048862880428426266608864606620466868060424848666462860

(there are non binary numbers in this string, which are definitly not present in the .csv file)

If I read the .csv in characters then the string shows as it's supposed to, but if I then change to as.numeric (as I need r to recognize the string as numbers) it reverts to scientific notation.

I've also tried

format(data$ch, scientific=F)

but it still keeps the scientific notation. Here's the structure of the data and the code I've used;

rm(list = ls())
options(scipen=999)
data<-read.csv("test.csv")
head(data)



  > dput(data)
structure(list(X = structure(1:101, .Label = c("WS000", "WS010", 
"WS014", "WS018", "WS021", "WS030", "WS032", "WS041", "WS044", 
"WS052", "WS067", "WS071", "WS081", "WS087", "WS100", "WS102", 
"WS106", "WS108", "WS111", "WS113", "WS127", "WS132", "WS133", 
"WS141", "WS151", "WS157", "WS160", "WS167", "WS170", "WS171", 
"WS172", "WS173", "WS174", "WS175", "WS176", "WS177", "WS178", 
"WS179", "WS180", "WS183", "WS187", "WS189", "WS190", "WS192", 
"WS193", "WS195", "WS198", "WS199", "WS201", "WS202", "WS203", 
"WS206", "WS207", "WS209", "WS213", "WS214", "WS216", "WS217", 
"WS220", "WS221", "WS222", "WS224", "WS225", "WS227", "WS229", 
"WS230", "WS236", "WS262", "WS263", "WS266", "WS269", "WS270", 
"WS271", "WS273", "WS275", "WS276", "WS279", "WS285", "WS286", 
"WS288", "WS289", "WS290", "WS294", "WS295", "WS298", "WS300", 
"WS302", "WS314", "WS322", "WS326", "WS327", "WS328", "WS333", 
"WS335", "WS337", "WS338", "WS340", "WS350", "WS382", "WS383", 
"WS386"), class = "factor"), ch = c(1.0001000001001e+267, 1.0000001e+261, 
1e+130, 1.1100001001e+264, 1.00100001e+250, 1e+232, 1e+264, 1.000001e+232, 
1e+172, 1.11001e+259, 1e+254, 1.100110111001e+262, 1e+252, 1e+257, 
1.1001100001e+258, 1.000000001e+257, 1.0000000000011e+252, 1.010000001e+264, 
1.00000010100011e+270, 1e+207, 1.10110110001e+261, 1.0001001e+261, 
1.0000011000001e+261, 1.0000000000001e+258, 1.00110000001e+257, 
1.101e+242, 1.0000000011e+268, 1.1e+225, 1.0001000010111e+266, 
1e+203, 1.1e+245, 1.01001e+263, 1.0001e+257, 1.100000000001e+244, 
1.01e+263, 1.00000000100011e+271, 1.000000000001e+255, 1e+241, 
1.0011e+258, 1.000001100001e+260, 1e+169, 1.01e+174, 1.00000000001e+263, 
1e+271, 1.00001e+267, 1e+257, 1.00000000000001e+254, 1e+258, 
1e+159, 1e+151, 1.00110001e+261, 1.10011e+258, 1e+261, 1e+157, 
1.1e+245, 1.0001e+252, 1e+232, 1e+229, 1.00110010001001e+217, 
1e+188, 1.0001e+207, 1e+222, 1.1000110001011e+78, 1e+177, 1.00000000000001e+202, 
1e+226, 1.100011e+162, 1e+202, 1e+201, 1e+170, 1e+181, 1e+172, 
1.0000000010001e+170, 1e+172, 1e+85, 1e+152, 1.1e+166, 1.01e+160, 
1.00000000001e+89, 1.100010010011e+158, 1e+158, 1e+158, 1.01000000000001e+35, 
1e+134, 1e+143, 1.1e+133, 1e+69, 1.00100001e+74, 1e+84, 1e+73, 
1.0000011101e+72, 1e+73, 1e+69, 1e+66, 1e+66, 1e+66, 1e+51, 1.00000000000011e+33, 
1.00100100001e+28, 1e+28, 1e+26)), class = "data.frame", row.names = c(NA, 
-101L))

What am I doing wrong, and where are the non-binary numbers coming from?! Thanks so much in advance!

zx8754
  • 52,746
  • 12
  • 114
  • 209
j.harv3y
  • 171
  • 1
  • 4
  • 2
    The preblem is that your integer is too large to fit into R's integer type. Look at Rmpfr package. – Pavel Obraztcov Sep 30 '19 at 12:03
  • 1
    As @PavelObraztcov noted, these very large numbers will be stored as floating point numbers not as integers. This does not allow them to be stored "precisecly" and the floating point represenations will be the closest approximation to the original number that can be stored within the constarints of the number of bits used. For more details on why floating point numbers may differ from the exact value you expect, see this related question - https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal – dww Sep 30 '19 at 13:44

0 Answers0