3

I am importing an excel file into R, however, some of the values are changing from the original value by adding a significant number of decimal places.

i.e The original value may be 77.21 but is changed to 72.209999999999994. I figure it has something to do with floating point numbers?
I know this example would round back to 72.21, however, sometimes it is occurring on much smaller numbers. I also need to ensure the reported value in the excel sheet is what is getting imported.
Does anyone have ideas how to manage this?

Here is a sample of how the data is stored in excel

  • sample Ni Co Al2O3 Fe2O3 S etc....
  • 84564 <0.005 <0.005 2.65 77.21 0.052
  • 84107 <0.005 <0.005 2.64 77.18 0.051

....

the "<" less than indicated below detection and is handled else where.

my code is : data <- read_excel(file.path)

what I get in R is

  • sample Ni Co Al2O3 Fe2O3 S
  • 84564 <0.005 <0.005 2.6499999999999999 77.209999999999994 0.051999999999999998
  • 84107 <0.005 <0.005 2.6400000000000001 77.180000000000007 0.050999999999999997

Some fields need to be round 2 decimals places and some 3 or 4. My main concern is that numbers may get rounded or modified beyond the detection limit.

Spooked
  • 586
  • 1
  • 4
  • 16
  • "however, sometimes it is occurring on much smaller numbers" Then the only solution would be to import as character strings and turn the characters into arbitrary precision numbers in a subsequent step. I'm not an Excel expert but I'm sure that it also uses floating point numbers. Have you considered that what you get in R might actually be what you had in Excel? Also, why do you need to ensure that "the reported value in the excel sheet is what is getting imported" beyond floating point precision? – Roland Jul 28 '20 at 06:51
  • Yes, I did look at the values in excel out the 30 decimals places and it does not have the extra numbers that R is producing. I might try your idea of importing as characters and converting once in R. – Spooked Jul 28 '20 at 07:03
  • I usually import from a CSV and not directly from Excel files. That way, I'd never had any issues. I just need to remember that Excel exports only the digits shown to CSV. – Roland Jul 28 '20 at 07:06

3 Answers3

1

2.65 cannot be represented exactly with floating point numbers:

sprintf("%.16f", 2.65)
#[1] "2.6499999999999999"

If you need higher precision (I seriously doubt that), you need to use a package for arbitrary precision numbers:

library(Rmpfr)
mpfr("2.65", 32)
#1 'mpfr' number of precision  32   bits 
#[1] 2.6500000004

mpfr("2.65", 64)
#1 'mpfr' number of precision  64   bits 
#[1] 2.65000000000000000009

mpfr("2.65", 128)
#1 'mpfr' number of precision  128   bits 
#[1] 2.650000000000000000000000000000000000005

R uses 53 bits (see help("is.double")):

mpfr("2.65", 53)
#1 'mpfr' number of precision  53   bits 
#[1] 2.6499999999999999

However, you need to consider that Excel also uses floating point numbers internally and can display a rounded representation [1]. My Excel version (2019) claims (in its help) to use 64 bits precision.

I'm extremely skeptical that you need to care about this. Using higher precision then the default is expensive (in developer time and computing resources) and if your workflow involves Excel I would be very surprised if you do something needing this. I have never needed it myself.

[1] It actually rounds for display by default. I have just tested that.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • This problem does not occur when I import from CSV. It appears to be unique to excel files. I have checked the values in excel they are not what is being shown in R. I have some 30 columns. My primary concern are values that are small such as 0.0005. – Spooked Jul 28 '20 at 07:40
  • Excel does not show the actual number, it rounds before printing the number (R does that too). Exporting from Excel to CSV also rounds the numbers. You still have not provided the reason why you care about this. – Roland Jul 28 '20 at 07:42
0

Here's my solution with dplyr Would be great to get the actual data to use, but since we did not I just made some random numbers in an excel file:

Code


# STEP 1: IMPORT LIBRARIES:
library(readxl) # Needed for 'read_excel'
library(dplyr)  # Needed for manipulation

# Step 2: Import data as dataframe:
data.unformatted = as.data.frame(read_excel("C:/Users/A/Documents/scratchboard/sample numbers.xlsx"))

# Step 3: round the numbers to 2 digits:
data.only.2.dig = data.unformatted %>% 
  mutate_if(is.numeric, round, digits=2)

# Alternatively, step 2+3 in one line:
data = as.data.frame(
  read_excel(
    "C:/Users/A/Documents/scratchboard/sample numbers.xlsx")
) %>% 
  mutate_if(is.numeric, round, digits=2)

#OUTPUT:
# Notice that it only affects the Num column, which is the 6th
head(data.unformatted) 
head(data.only.2.dig)

The output:

 #OUTPUT:
# Notice that it only affects the Num column, which is the 6th
> head(data.unformatted)
  number      city      dates classes     lt       Num
1      1     Boise 2020-01-01      52  solid 0.7973496
2      2     Boise 2020-02-01      36  solid 0.2830991
3      3     Boise 2020-03-01      69  solid 0.1539214
4      4     Boise 2020-04-01     100  solid 0.1153002
5      5     Boise 2020-05-01      72  solid 0.1745657
6      6 Pocatello 2020-01-01      82 dashed 0.2586195
> head(data.only.2.dig)
  number      city      dates classes     lt  Num
1      1     Boise 2020-01-01      52  solid 0.80
2      2     Boise 2020-02-01      36  solid 0.28
3      3     Boise 2020-03-01      69  solid 0.15
4      4     Boise 2020-04-01     100  solid 0.12
5      5     Boise 2020-05-01      72  solid 0.17
6      6 Pocatello 2020-01-01      82 dashed 0.26

Next time please share the actual data so we can reproduce your problem

Dharman
  • 30,962
  • 25
  • 85
  • 135
anakar
  • 316
  • 2
  • 13
  • It seems like this solution should work fine for you. you can play around with the ```round``` function and apply it differently for each column AFTER you imported the data. I don't see how a number can be rounded "wrongly" (under the detection limit), it is doing exactly what Excel is doing with the rounding – anakar Jul 28 '20 at 07:01
  • I would suggest importing all the data in at 4 decimal points and then running ```round``` on each column as appropriate (you only have 5-6. It makes sense to do it for each one) – anakar Jul 28 '20 at 07:03
0

I had the same issue after I added text values in data frame of Excel. After I removed such fields the problem gone. It is only first row of the table can be text. My solution was to use two excel files for import: One with only digits and second with text values.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 11 '22 at 00:16