0

I have a data frame read in from a .csv that automatically reads a column in scientific notation (I know that this is the default in R)

df
      rep pollen_dead_alive
1   outdoor     1.000011e+296
2    outdoor     1.011111e+298
3    room     1.100011e+299
4   room     1.111001e+287
5   cooler     1.111001e+287
6   cooler     1.010101e+295

I would like pollen_dead_alive column to not be in scientific notation so I tried

format(pollen_germ$pollen_dead_alive, scientific = FALSE)

From the function above, the first row looks like this

100001111100111101640806828462208002202468400260024284684668262804408686060644644066206800428848288486484002022402242808020206206060620482088002046224266088000046646040442462246802486284460460848080460682460288402004840068202664448084040062684008224282884886088084444028644608868444440608648462866

But in my .csv that I imported, the first row looks like this (apologies this looks ugly)

000100001111100111110000011111111111111111110000000011111100001111100001100001111111111000001111111111000011010100000011000000000000000011111111111111100000000000111111111110000000000111111000000111111111111111111111111111111111111000000000000000000000000011111111111111111100000000000011111101101000

I would like my data frame to have the format of only ones and zeros (example below)

df
      rep pollen_dead_alive
1   outdoor     11110000011101111001111100000111111010100101110010100
2    outdoor    11110000011101111001111100000111111010100100000111111
3    room     111100000111011110011111000001111110101000110001100
4   room     11110000011101111001111100000111111010100011110000
5   cooler     111100000111011110011111000001111110101111110101010
6   cooler     1111000001110111100111110000011111101010010101010111

How would I achieve this in R?

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • 1
    When you read in your data with with `read.csv` make sure to read all the columns as character values. If R sees something like a number, it tries to turn it into a numeric type but R can't store infinite precision integers. Things get rounded to store them as floating point numbers. You can use `read.csv(..., colClasses="character")` to read in everything as character values. – MrFlick Aug 24 '23 at 17:58
  • 2
    I don‘t know for sure, but this row looks like binary code. It is very unlikely that the shown digits represent a decimal number. This is supported by the fact that the first three digits are leading zeros which usually isn‘t the case for decimal numbers. – deschen Aug 24 '23 at 18:09
  • What that much "precision", I think your assumption that you can recoup `"000100001111...1000"` from `1.000011e+296` to be _highly_ unlikely, given all of the same reasons of [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) (and, loosely related, https://stackoverflow.com/q/9508518/3358272). I agree with MrFlick, you should change how you are importing the data to get it right the first time, since _this_ import is "lossy" and irrecoverable. – r2evans Aug 24 '23 at 22:08

1 Answers1

0

FWIW, should in fact your source data be binary, set the parameters to read the cells in as character strings. Then use (disclaimer: I wrote the package) library(bigBits); base2base(input, 2, 10) to get the decimal values.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73