0

I read into R a CSV file with large numbers formatted in scientific notation. I used a couple of statistical R functions (MSE and RMSPE) on the numbers and got an incorrect answer (I checked it in Excel). When I changed the format in the CSV file to ordinary number format, i.e. with lots of zeroes, the R functions calculated correctly. What was I doing wrong?

Thanks for any insights, Claire UPDATE: console output added. I am using R4.0.2. I have imported two CSV files, one called MPERRORS.csv with the original scientific notation format and the second called CBERRORS.csv saved in number format. I believe the issue is to do with the conversion in Excel of scientific notation format numbers. Code is below and I have also pasted in results. If you look at number 6.89E+11 it shows as 689000000000 in the formula bar but if you convert it to number format you get 689116020736. Apologies if this is wrong, I am a newbie with minimal R experience as you will have guessed.

CLAIRE platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.2
year 2020
month 06
day 22
svn rev 78730
language R
version.string R version 4.0.2 (2020-06-22)

MPERRORS RESULTS 3.359375e+20 MSE function for MPERRORS 3.359375e+20 mse function for MPERRORS 0.01878106 RMSPE function 0.991949 R2 function 0.9916312 gofR2 function

CBERRORS results 2.94363e+20 MSE 2.94363e+20 mse 0.01805762 RMSPE 0.9929211 R2

enter code here
version
library(ineq)
library(Metrics)
library(MLmetrics)
library(ehaGoF)
ERRORS1<-read.csv(file = 'MPerrors.csv')
ERRORS2<-read.csv(file = 'CBerrors.csv')
str(ERRORS1)
str(ERRORS2)
hist1<-ERRORS1[,2]
base1<-ERRORS1[,3]
print(hist1)
dput(head(ERRORS1,10))

MSE(base1,hist1)
mse(base1,hist1)
RMSPE(base1,hist1)
R2_Score(base1,hist1)
gofRSq(base1,hist1, dgt = 7)

hist2<-ERRORS2[,2]
base2<-ERRORS2[,3]
print(hist2)
dput(head(ERRORS2,10))

MSE(base2,hist2)
mse(base2,hist2)
RMSPE(base2,hist2)
R2_Score(base2,hist2)
gofRSq(base2,hist2, dgt = 7)

# MPERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices = 
c(6.89e+11, 7.51e+11, 7.27e+11, 7.55e+11, 7.85e+11, 7.99e+11, 8.53e+11, 
8.95e+11, 
9.67e+11, 1.02e+12), X..BusinessAsUsual = c(6.79e+11, 7.25e+11, 
7.31e+11, 7.66e+11, 7.76e+11, 7.86e+11, 8.26e+11, 8.84e+11, 9.56e+11, 
1.01e+12), Diff = c(9.93e+09, 2.54e+10, -4.32e+09, -1.05e+10, 
9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10, 1.13e+10, 1.49e+10)), row.names = 
c(NA, 
10L), class = "data.frame")
#CBERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices = 
c(689116020736, 
750739980288, 726938025984, 755445989376, 785442996224, 799333023744, 
852837007360, 894628003840, 966879019008, 1021999972352), X..BusinessAsUsual 
= c(679182532608, 
725334294528, 731261042688, 765934698496, 776039104512, 785780506624, 
825845153792, 884472348672, 955611414528, 1007061172224), Diff = c(9.93e+09, 
2.54e+10, -4.32e+09, -1.05e+10, 9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10, 
1.13e+10, 1.49e+10)), row.names = c(NA, 10L), class = "data.frame")
  • 3
    Can you please provide some sort of [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Normally R can read in numbers in scientific notation. Like the problem is there is a non-numeric value in the column which throws the type detection off. Also, are you using a version of R <4.0? That might be turning your values into a factor which assigns them different numeric values. – MrFlick Sep 15 '20 at 01:58
  • it looks like an excel csv issue. Google e.g. "excel csv scientific notation digits" and you find many issues. The main suggestion appears to be to save large numbers as text in csv files. You can then read the file and coerce the text to numeric in R. – Paul van Oppen Sep 15 '20 at 02:16
  • 1
    Hi @ClaireBrereton, Welcome to StackOverflow! Can you include your code in your question and also do ```dput(head(your_dataframe_name_here,10))``` to show the first ten rows of your data? You can paste those into questions using the "edit" option on the bottom left horizontal menu just under your question. You can also use ```version``` and paste the output to include that per @MrFlick's comment. – Russ Thomas Sep 15 '20 at 02:41

0 Answers0