0

I'm trying to read an Excel file into R.

I used read_excel function of the readxl package with parameter col_types = "text" since the columns of the Excel sheet contain mixed data types.

  df <- read_excel("Test.xlsx",sheet="Sheet1",col_types = "text")

same range of df compared to MyExcel

But it appears a very slight difference in the numeric value is introduced. It's always those few values so I think it's some hidden attributes in Excel.

  1. I tried to format those values as numbers in Excel, and also tried add 0s after the number, but it won't work.
  2. I changed the numeric value of a cell from 2.3 to 2.4, and it was read correctly by R.
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Isaac
  • 67
  • 4

1 Answers1

1

This is a consequence of floating-point imprecision, but it's a little tricky. When you enter the number 1.2 (for example) into R or Excel, it's not represented exactly as 1.2:

print(1.2,digits=22)
## [1] 1.199999999999999955591

Excel and R usually try to shield you from these details, which are inevitable if you're using fixed precision floating-point values (which most computer systems do), by limiting the printing precision to a level that will ignore those floating-point imprecisions. When you explicitly convert to character, however, R figures you don't want to lose information, so it gives you all the digits. Numbers that can be represented exactly in a binary representation, such as 2.375, don't gain all those extra digits.

However, there's a simple solution in this case:

readxl::read_excel("Test.xlsx", na="ND")

This tells R that the string "ND" should be treated as a special "not available" value, so all of your numeric values get handled properly. When you examine your data, the tiny imprecisions will still be there, but R will print the numbers the same way that Excel does.

I feel like there's probably a better way to approach this (mixed-type columns are really hard to deal with), but if you need to 'fix' the format of the numbers you can try something like this:

x <- c(format(1.2,digits=22),"abc")
## [1] "1.199999999999999955591" "abc"                    
fix_nums <- function(x) {
    nn <- suppressWarnings(as.numeric(x))
    x[!is.na(nn)] <- format(nn[!is.na(nn)])
    return(x)
}
fix_nums(x)  
## [1] "1.2" "abc"

Then if you're using tidyverse you can use my_data %>% mutate_all(fix_nums)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That's fascinating! However for my dataset there is a distinction between `NA` value and character "ND"(and there are some other characters as well). I'm planning to write a rounding function to loop through my dataset, would you approach this differently? – Isaac Oct 16 '20 at 21:03