1

I am working with a dataframe (n = 13,000) containing financial data in dollar amounts, stored as numeric. There are 5 columns (C1 - C5) containing dollar amounts and I am trying to create a new column (C6) that is a based off a calculation of 4 others. I am using the following code:

df$C6 <- C1 + C2 + C3 - C4

However, when looking at the the output I notice R is storing it in scientific notation. Furthermore, when I covert it using format I notice the values are slightly off. For example what should be 7.46 ends up as 7.4599999999999999644729.

I decided to investigate further and I noticed that only specific rows are causing this to happen and R is forcing all other rows into scientific notation as a result.

Values for one such row are: C1 = 6.47 C2 = 1.00 C3 = 0.00 C4 = 0.00. This is resulting in C6 = 7.4599999999999999644729 after converting the scientific notation into decimal using format.

Any advice would be appreciated.

  • You can use `options(scipen = 999)` – akrun Apr 22 '20 at 21:45
  • I tried this, the same problem with values being slightly off was occurring – User 998478328 Apr 22 '20 at 21:46
  • 1
    R never stores numbers in scientific notation, it stores them in a binary method related to (or perhaps perfectly based off of) IEEE-754. There is a distinct difference between how objects are *stored* in R, and how they are *presented* at any given moment. The latter is often controlled by whichever `print` method is used; see `print(data.table(pi=pi))` versus `print(as_tibble(data.table(pi=pi)))` for one difference. – r2evans Apr 22 '20 at 22:01
  • This is so-very-closely related to [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f), and therefore nearly a dupe of https://stackoverflow.com/q/9508518/3358272. Please consider that and come back with more info if you feel your case is distinct from those. (For those not familiar with binary encoding of numbers in a deep pedagogical CS-curriculum or some other purpose, this may first come across as unintuitive and/or *"why can't we just fix this"*. With much more thought, I hope you can come to appreciate the challenge o/w :-) – r2evans Apr 22 '20 at 22:04
  • Have you considered `format(x, scientific=FALSE);` or `as.integer(x);` ? (courtesy of https://stackoverflow.com/a/21509371/1092247) – rvrvrv Apr 22 '20 at 22:11
  • Know that `format` (and `sprintf`) converts your numbers to strings, so when numbers count, this is relevant only when rendering your numbers in a report of some sort. – r2evans Apr 22 '20 at 22:44
  • (My previous demo using `data.table` was *intended* to demo with `data.frame`, not `data.table`. The point does not change, but not everybody has `data.table` installed. My apologies for the complication there. Just use `data.frame` in both and the demo should still work. Assuming you have `tibble` somehow loaded, of course ...) – r2evans Apr 22 '20 at 22:50

0 Answers0