compare 3 model metrics with ggplot2

Question

When plotting 3 model metrics (RMSE, MAE, Rsquared) from 3 trained models to a test set metrics, i'm trying to show that Neural Network model is the best as

there is least distance between train and test metrics
it also has low enough RSME/MAE and high Rsquared

Now it is not that obvious from the attached plot. Also, metrics are on the different scale as Rsquared is in [0,1] interval. Is there a way to plot it better, preferably on the same plot?

> trn
            model RMSE Rsquared  MAE dataType
1      Linear Reg 9.17     0.51 6.03    train
2      SVM Radial 7.86     0.64 4.86    train
3 Neural Networks 8.55     0.57 5.59    train
> tst
            model RMSE Rsquared  MAE dataType
1      Linear Reg 9.40     0.53 5.95     test
2      SVM Radial 9.16     0.55 5.50     test
3 Neural Networks 8.66     0.60 5.48     test
>

reproducible code:

trn <- structure(list(model = c("Linear Reg", "SVM Radial", "Neural Networks"),
                      RMSE = c(9.17, 7.86, 8.55), Rsquared = c(0.51, 0.64, 0.57),
                      MAE = c(6.03, 4.86, 5.59)),
                 row.names = c(NA, -3L), class = "data.frame")

tst <- structure(list(model = c("Linear Reg", "SVM Radial", "Neural Networks"),
                      RMSE = c(9.4, 9.16, 8.66), Rsquared = c(0.53, 0.55, 0.6),
                      MAE = c(5.95, 5.5, 5.48)),
                 row.names = c(NA, -3L), class = "data.frame")

trn['dataType'] = 'train'
tst['dataType'] = 'test'

long_tbl <- rbind(trn, tst) %>%
  pivot_longer(cols =!c('model', 'dataType'), names_to = 'metric', values_to='value')

ggplot(long_tbl, aes(x=model, y=value, shape = dataType, colour = metric )) + 
  geom_point()

score 1 · Accepted Answer · answered Jan 04 '21 at 17:36

The easiest way to do this would be + facet_wrap(~metric, scales="free") but I don't think that satisfies your "in one plot" requirement (it's in one plot statement but three sub-plots). If gg1 is your original plot then this is a pretty compressed format:

print(gg1 
    + facet_wrap(~metric,scale="free_y",ncol=1) 
    + theme_bw() 
    + theme(panel.spacing=grid::unit(0,"lines"),
             strip.background=element_blank(),strip.text.x=element_blank())
)

Anything more compressed than this would require you to make some decisions about what information to throw away (e.g., are you willing to rescale all of your metrics to min=0, max=1, or does the magnitude of the differences convey information?

Leaving the strip labels intact would probably make the graph easier to read (users wouldn't have to squint at the legend to figure out which metric was which); you could also try moving the strip labels to the right edge.

great, I just added `axis.title = element_blank()`. should have mentioned it is going into `rmarkdown` document and it looks fine in there, i just tested — gregV, Jan 04 '21 at 19:41
that works, you could use `+labs(x="",y="")` if you wanted to blank out both x and y axis labels (slightly cleaner than setting the theme) — Ben Bolker, Jan 04 '21 at 20:58

compare 3 model metrics with ggplot2

1 Answers1