Plotting model comparison statistics in R

Question

I combined several data-frames into a data-frame dfc with a fifth column called model specifying which model was used for imputation. I want to plot the distributions by grouping them by model.

dfc looks something like: (1000 rows, 5 columns)

X1        X2        X3        X4      model
1500000 400000    0.542      7.521    actual
250000  32000     2.623     11.423   missForest
...

I use the lines below to plot:

library(lattice)
densityplot(X1 + X2 + X3 + X4, group = dfc$model)

giving:

Note that X1 <- dfc$X1 (and likewise)

My questions are:

How can I add a legend to this plot? (this plot is useless if one can't tell which colour belongs to which model)
Is there, perhaps, a more visually appealing way to plot this? Using ggplot, perhaps?
Is there a better way to compare these models? For example, I could plot for each column separately.

... and to add a reproducible example, to that this posting has more value for the community... — lukeA, Jun 16 '16 at 08:21
I've specified every single variable being used, and the exact function that I can't get my head around. The question is clear and specific as could have been. I really can not give away the code, it isn't mine to reproduce (or directly reference) on this community or anywhere else for that matter. — , Jun 16 '16 at 08:28
You really don't have to give your actual code away. But you could just provide a reproducible example (e.g. as below in the answer) with some random data you just made up. — Alex, Jun 16 '16 at 08:34
@Aayush Is some random code to impute missing values on a data set like iris top secret? What I meant was: there are good questions on SO and there are not so good ones: [how do I ask a good question](http://stackoverflow.com/help/how-to-ask) and [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). — lukeA, Jun 16 '16 at 09:08

Alex · Accepted Answer · 2016-06-16T08:53:37.817

A fast density plot using ggplot.

library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)

data <- data.frame(rnorm = a, runif = b, rchisq = c)
data <- melt(data) #from reshape2 package

ggplot(data) + geom_density(aes(value, color = variable)) + 
               geom_jitter(aes(value, 0, color = variable), alpha = 0.5, height = 0.02 )

Remark: I added the reshape2 package because ggplot likes "long" data and I think yours are "wide".

Plotting each column seperatly would work like that:

ggplot(data) + geom_density(aes(value, color = variable)) 
             + geom_point(aes(value, 0, color = variable))  
             + facet_grid(.~variable)

Here the color might be redundant but you can just remove the color argument.

score 0 · Answer 2 · answered Jun 17 '16 at 05:04

All I had to do was set an argument:

densityplot(X1 + X2 + X3 + X4, group = dfc$model, auto.key = TRUE) gives the desired plot

The problem was that I couldn't figure out which densityplot() was R using.

The other parts of the question remain open.

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

Data copied from @alex

library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)

dat <- data.frame(Hmisc = a, MICE = b, missForest = c)
dat <- melt(dat)

library(lattice) # using lattice package 
densityplot(~value,dat,groups = variable,auto.key = T)

individual plots

densityplot(~value|variable,dat,groups = variable,auto.key = T,scales=list(relation="free"))

Plotting model comparison statistics in R

3 Answers3

individual plots