I have a dataframe consisting of observed and modelled data that I am trying to do a RMSLE metric on. The data is stored in one text file, which I can read in and has the format similar to this:
Group | Time | Observed | Predicted |
---|---|---|---|
A | 2 | 190 | 312 |
A | 3 | 174 | 345 |
A | 6 | 150 | 290 |
A | 12 | 85 | 217 |
B | 4 | 300 | 725 |
B | 9 | 113 | 426 |
B | 13 | 120 | 393 |
B | 23 | 97 | 263 |
In reality I have a lot more data points and many more groups to go through to calculate the RMSLE--this is just some dummy data, and have been using MLmetrics/dplyr. I have messed around for too long, trying to get something to work, using a 'for loop' with a custom function, trying to use a 'split' function and saving data to a file to read back in to do each group individually. Some description/code is below. I have been able to use RMSLE for one group successfully as follows (i.e. one group in its own file):
Set up two vectors
y_pred <- Model_Comparison$Predicted
print (y_pred)
y_true <- Model_Comparison$Observed
print(y_true)
Calculate RMSLE:
RMSLE(y_pred = y_pred, y_true = y_true)
I have tried using a for loop, and have tried to put the RMSLE in different places including as a user defined function, but run into troubles having to define the y_pred and y_true, and I believe it is because by predefining the y's it takes it as a single vector to pass into the loop resulting in one value only.
ModFull <- read.table(C:/...)
spp.l <- split(ModFull, ModFull$Group)
#For loop to look at first few lines of each spp in list
for(Group in spp.l)
{
print(head (Group))
}
Define y's
y_pred <- ModFull$Predicted
y_true <- ModFull$Observed
#Now get statistic for each Group using a for loop
res <- list()
for (n in names (spp.l))
RMSLE(y_true = sqrt(mean((log(1 + y_true) - log(1 + y_pred))^2))
{
dat <- spp.l[[n]]
res[[n]] <- data.frame(Group=n,
RMSLE,
n.samples=nrow(dat))
}
print(res)
The above code results in the same RMSLE value for all Groups, but the structure is fine.
Group | RMSLE | n.samples |
---|---|---|
A | 0.929 | 4 |
B | 0.929 | 4 |
I have also tried to use a split approach, and tried to save the individual files to ".Rdata following these methods (R - split data frame and save to different files) but these result in corrupted files: Error in load(name, envir = .GlobalEnv) : bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning messages: 1: In readChar(con, 5L, useBytes = TRUE) : truncating string with embedded nuls 2: file ‘Group1.Rdata’ has magic number 'X' Use of save versions prior to 2 is deprecated
Lastly I have tried to follow some work on RMSE values (How to calculate RMSE for groups of data from csv) but run into the same problem; trying to define the y_pred and y_true values.;
Any and all help would be appreciated,