I read into R a CSV file with large numbers formatted in scientific notation. I used a couple of statistical R functions (MSE and RMSPE) on the numbers and got an incorrect answer (I checked it in Excel). When I changed the format in the CSV file to ordinary number format, i.e. with lots of zeroes, the R functions calculated correctly. What was I doing wrong?
Thanks for any insights, Claire UPDATE: console output added. I am using R4.0.2. I have imported two CSV files, one called MPERRORS.csv with the original scientific notation format and the second called CBERRORS.csv saved in number format. I believe the issue is to do with the conversion in Excel of scientific notation format numbers. Code is below and I have also pasted in results. If you look at number 6.89E+11 it shows as 689000000000 in the formula bar but if you convert it to number format you get 689116020736. Apologies if this is wrong, I am a newbie with minimal R experience as you will have guessed.
CLAIRE
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.2
year 2020
month 06
day 22
svn rev 78730
language R
version.string R version 4.0.2 (2020-06-22)
MPERRORS RESULTS 3.359375e+20 MSE function for MPERRORS 3.359375e+20 mse function for MPERRORS 0.01878106 RMSPE function 0.991949 R2 function 0.9916312 gofR2 function
CBERRORS results 2.94363e+20 MSE 2.94363e+20 mse 0.01805762 RMSPE 0.9929211 R2
enter code here
version
library(ineq)
library(Metrics)
library(MLmetrics)
library(ehaGoF)
ERRORS1<-read.csv(file = 'MPerrors.csv')
ERRORS2<-read.csv(file = 'CBerrors.csv')
str(ERRORS1)
str(ERRORS2)
hist1<-ERRORS1[,2]
base1<-ERRORS1[,3]
print(hist1)
dput(head(ERRORS1,10))
MSE(base1,hist1)
mse(base1,hist1)
RMSPE(base1,hist1)
R2_Score(base1,hist1)
gofRSq(base1,hist1, dgt = 7)
hist2<-ERRORS2[,2]
base2<-ERRORS2[,3]
print(hist2)
dput(head(ERRORS2,10))
MSE(base2,hist2)
mse(base2,hist2)
RMSPE(base2,hist2)
R2_Score(base2,hist2)
gofRSq(base2,hist2, dgt = 7)
# MPERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices =
c(6.89e+11, 7.51e+11, 7.27e+11, 7.55e+11, 7.85e+11, 7.99e+11, 8.53e+11,
8.95e+11,
9.67e+11, 1.02e+12), X..BusinessAsUsual = c(6.79e+11, 7.25e+11,
7.31e+11, 7.66e+11, 7.76e+11, 7.86e+11, 8.26e+11, 8.84e+11, 9.56e+11,
1.01e+12), Diff = c(9.93e+09, 2.54e+10, -4.32e+09, -1.05e+10,
9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10, 1.13e+10, 1.49e+10)), row.names =
c(NA,
10L), class = "data.frame")
#CBERRORS FIRST 10 LINES
structure(list(Time..Year. = 1990:1999, real.gdp.at.market.prices =
c(689116020736,
750739980288, 726938025984, 755445989376, 785442996224, 799333023744,
852837007360, 894628003840, 966879019008, 1021999972352), X..BusinessAsUsual
= c(679182532608,
725334294528, 731261042688, 765934698496, 776039104512, 785780506624,
825845153792, 884472348672, 955611414528, 1007061172224), Diff = c(9.93e+09,
2.54e+10, -4.32e+09, -1.05e+10, 9.4e+09, 1.36e+10, 2.7e+10, 1.02e+10,
1.13e+10, 1.49e+10)), row.names = c(NA, 10L), class = "data.frame")