RMSLE looping in R

Question

I have a dataframe consisting of observed and modelled data that I am trying to do a RMSLE metric on. The data is stored in one text file, which I can read in and has the format similar to this:

Group	Time	Observed	Predicted
A	2	190	312
A	3	174	345
A	6	150	290
A	12	85	217
B	4	300	725
B	9	113	426
B	13	120	393
B	23	97	263

In reality I have a lot more data points and many more groups to go through to calculate the RMSLE--this is just some dummy data, and have been using MLmetrics/dplyr. I have messed around for too long, trying to get something to work, using a 'for loop' with a custom function, trying to use a 'split' function and saving data to a file to read back in to do each group individually. Some description/code is below. I have been able to use RMSLE for one group successfully as follows (i.e. one group in its own file):

Set up two vectors
y_pred <- Model_Comparison$Predicted
print (y_pred)
y_true <- Model_Comparison$Observed
print(y_true)

Calculate RMSLE: 
RMSLE(y_pred = y_pred, y_true = y_true)

I have tried using a for loop, and have tried to put the RMSLE in different places including as a user defined function, but run into troubles having to define the y_pred and y_true, and I believe it is because by predefining the y's it takes it as a single vector to pass into the loop resulting in one value only.

ModFull <- read.table(C:/...)

spp.l <- split(ModFull, ModFull$Group)

#For loop to look at first few lines of each spp in list
for(Group in spp.l) 
  {
  print(head (Group))
  }

Define y's

y_pred <- ModFull$Predicted
y_true <- ModFull$Observed

#Now get statistic for each Group using a for loop
res <- list()
for (n in names (spp.l))
RMSLE(y_true = sqrt(mean((log(1 + y_true) - log(1 + y_pred))^2))
  {
   dat <- spp.l[[n]]
   res[[n]] <- data.frame(Group=n,
      RMSLE,
      n.samples=nrow(dat))
    }

print(res)

The above code results in the same RMSLE value for all Groups, but the structure is fine.

Group	RMSLE	n.samples
A	0.929	4
B	0.929	4

I have also tried to use a split approach, and tried to save the individual files to ".Rdata following these methods (R - split data frame and save to different files) but these result in corrupted files: Error in load(name, envir = .GlobalEnv) : bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning messages: 1: In readChar(con, 5L, useBytes = TRUE) : truncating string with embedded nuls 2: file ‘Group1.Rdata’ has magic number 'X' Use of save versions prior to 2 is deprecated

Lastly I have tried to follow some work on RMSE values (How to calculate RMSE for groups of data from csv) but run into the same problem; trying to define the y_pred and y_true values.;

Any and all help would be appreciated,

It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Jan 25 '21 at 03:11

score 0 · Answer 1 · answered Jan 25 '21 at 11:05

0

You are using the unsplitted full data.frame to assign your y_pred and y_true. You should assign y_pred and y_true inside the for loop using the splitted data.frame.

#Now get statistic for each Group using a for loop
res <- list()
for (n in names (spp.l)){
   dat <- spp.l[[n]]
   y_true = dat$Observed
   y_pred = dat$Predicted
   RMSLE = RMSLE(y_pred = y_pred, y_true = y_true)
   res[[n]] <- data.frame(
      Group=n,
      RMSLE,
      n.samples=nrow(dat)
   )
}

answered Jan 25 '21 at 11:05

Wawv

371
2
6

That worked great Wawv, thanks for your time on this. I had actually tried to put the y's inside before, but hadn't changed them to dat$ (they were ModFull$, and resulted in the same value for all groups)--that was the key thing to do. – Random Lee Jan 25 '21 at 12:33

Group	Time	Observed	Predicted
A	2	190	312
A	3	174	345
A	6	150	290
A	12	85	217
B	4	300	725
B	9	113	426
B	13	120	393
B	23	97	263

Group	Time	Observed	Predicted
A	2	190	312
A	3	174	345
A	6	150	290
A	12	85	217
B	4	300	725
B	9	113	426
B	13	120	393
B	23	97	263

RMSLE looping in R

1 Answers1

Group	Time	Observed	Predicted
A	2	190	312
A	3	174	345
A	6	150	290
A	12	85	217
B	4	300	725
B	9	113	426
B	13	120	393
B	23	97	263