0

I have an excel file that has a column with several values, including NULL, 1.17, 1.19 and 2.01. When I do:

read.xlsx("myExcelFile.xlsx",sheet=1)

Most values in that column remain the same but the 2.01 changes to 2.0099999999999998

I am using openxlsx_4.2.5 with R 4.1.2

Any ideas on why this is happening? suggestions on how to have the 2.01 remain 2.01?

Answer

Thanks to insights from the commenters, this is actually something in excel where it is storing 2.01 to 2.009999999999998 due to the floating points reasons, so the source of the issue is not R or openxlsx but rather starts in Excel itself. The problem was solved by having openxlsx not read the column as a string: read.xlsx(..., na.strings = c('NA', 'NULL')). Thanks to the commenters!

ramiro
  • 11
  • 4
  • Change your print options to not print so many decimal places. `options(digits = 5)`. – Gregor Thomas Apr 05 '22 at 19:15
  • See here for explanation related to this issue that is common to any programming language that uses floating point numbers, which cannot describe some integers exactly: https://0.30000000000000004.com/ – Jon Spring Apr 05 '22 at 19:16
  • @GregorThomas There is a nuance here as to where the problem is occurring which is not answered by that duplicate link. I just tested it out, and 2.01 is stored as 2.0099999999999998 in the Excel file (the XML). It's important to note that R, openxlsx, and/or readxl are the not the problem. This is literally how Excel is storing the data (for the reason mentioned in that link). So any tool reading the data verbatim from Excel will have this problem. I think that is important part of answering this specific question. –  Apr 05 '22 at 19:41
  • 1
    i do not think this is floating point. it is very unlikely that a data frame will just print 15 digits on a numeric column, plus the "NULL" value suggests that `read.xlsx` has interpreted this column as a string, op should try `read.xlsx(..., na.strings = c('NA', 'NULL'))` or similar – rawr Apr 05 '22 at 20:19
  • 1
    @Adam, you are correct in that Excel does store 2.01 as 2.009999999999998 and @rawr, indeed, `read.xlsx(..., na.strings = c('NA', 'NULL'))` solved the problem! – ramiro Apr 05 '22 at 20:34

0 Answers0