1

I combined several data-frames into a data-frame dfc with a fifth column called model specifying which model was used for imputation. I want to plot the distributions by grouping them by model.

dfc looks something like: (1000 rows, 5 columns)

X1        X2        X3        X4      model
1500000 400000    0.542      7.521    actual
250000  32000     2.623     11.423   missForest
...

I use the lines below to plot:

library(lattice)
densityplot(X1 + X2 + X3 + X4, group = dfc$model)

giving:

comparison

Note that X1 <- dfc$X1 (and likewise)

My questions are:

  • How can I add a legend to this plot? (this plot is useless if one can't tell which colour belongs to which model)
  • Is there, perhaps, a more visually appealing way to plot this? Using ggplot, perhaps?
  • Is there a better way to compare these models? For example, I could plot for each column separately.
merv
  • 67,214
  • 13
  • 180
  • 245
  • Please feel free to suggest a better question title –  Jun 16 '16 at 08:00
  • ... and to add a reproducible example, to that this posting has more value for the community... – lukeA Jun 16 '16 at 08:21
  • I've specified every single variable being used, and the exact function that I can't get my head around. The question is clear and specific as could have been. I really can not give away the code, it isn't mine to reproduce (or directly reference) on this community or anywhere else for that matter. –  Jun 16 '16 at 08:28
  • You really don't have to give your actual code away. But you could just provide a reproducible example (e.g. as below in the answer) with some random data you just made up. – Alex Jun 16 '16 at 08:34
  • 1
    @Aayush Is some random code to impute missing values on a data set like iris top secret? What I meant was: there are good questions on SO and there are not so good ones: [how do I ask a good question](http://stackoverflow.com/help/how-to-ask) and [how to provide a minimal reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). – lukeA Jun 16 '16 at 09:08

3 Answers3

0

A fast density plot using ggplot.

library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)

data <- data.frame(rnorm = a, runif = b, rchisq = c)
data <- melt(data) #from reshape2 package

ggplot(data) + geom_density(aes(value, color = variable)) + 
               geom_jitter(aes(value, 0, color = variable), alpha = 0.5, height = 0.02 ) 

enter image description here

Remark: I added the reshape2 package because ggplot likes "long" data and I think yours are "wide".

Plotting each column seperatly would work like that:

ggplot(data) + geom_density(aes(value, color = variable)) 
             + geom_point(aes(value, 0, color = variable))  
             + facet_grid(.~variable)

enter image description here

Here the color might be redundant but you can just remove the color argument.

Alex
  • 4,925
  • 2
  • 32
  • 48
0

All I had to do was set an argument:

densityplot(X1 + X2 + X3 + X4, group = dfc$model, auto.key = TRUE) gives the desired plot

This is essentially what I needed

The problem was that I couldn't figure out which densityplot() was R using.

The other parts of the question remain open.

0

Data copied from @alex

library(ggplot2)
library(reshape2)
a <- rnorm(50)
b <- runif(50, min = -5, max = 5)
c <- rchisq(50, 2)

dat <- data.frame(Hmisc = a, MICE = b, missForest = c)
dat <- melt(dat)

library(lattice) # using lattice package 
densityplot(~value,dat,groups = variable,auto.key = T)

enter image description here

individual plots

densityplot(~value|variable,dat,groups = variable,auto.key = T,scales=list(relation="free"))

enter image description here

Community
  • 1
  • 1
Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22