15

I have a data frame, which after applying the melt function looks similar to:

 var       val
1 a 0.6133426
2 a 0.9736237
3 b 0.6201497
4 b 0.3482745
5 c 0.3693730
6 c 0.3564962

..................

The initial dataframe had 3 columns with the column names, a,b,c and their associated values. I need to plot on the same graph, using ggplot the associated ecdf for each of these columns (ecdf(a),ecdf(b),ecdf(c)) but I am failing in doing this. I tried:

p<-ggplot(melt_exp,aes(melt_exp$val,ecdf,colour=melt_exp$var))
pg<-p+geom_step()

But I am getting an error :arguments imply differing number of rows: 34415, 0.

Does anyone have an idea on how this can be done? The graph should look similar to the one returned by plot(ecdf(x)), not a step-like one.

Thank you!

agatha
  • 1,513
  • 5
  • 16
  • 28

3 Answers3

17

My first thought was to try to use stat_function, but since ecdf returns a function, I couldn't get that working quickly. Instead, here's a solution the requires that you attach the computed values to the data frame first (using Ramnath's example data):

library(plyr) # function ddply()
mydf_m <- ddply(mydf_m, .(variable), transform, ecd = ecdf(value)(value))

ggplot(mydf_m,aes(x = value, y = ecd)) + 
    geom_line(aes(group = variable, colour = variable))

enter image description here

If you want a smooth estimate of the ECDF you could also use geom_smooth together with the function ns() from the spline package:

library(splines) # function ns()
ggplot(mydf_m, aes(x = value, y = ecd, group = variable, colour = variable)) + 
    geom_smooth(se = FALSE, formula = y ~ ns(x, 3), method = "lm")

enter image description here

As noted in a comment above, as of version 0.9.2.1, ggplot2 has a specific stat for this purpose: stat_ecdf. Using that, we'd just do something like this:

ggplot(mydf_m,aes(x = value)) + stat_ecdf(aes(colour = variable))
PatrickT
  • 10,037
  • 9
  • 76
  • 111
joran
  • 169,992
  • 32
  • 429
  • 468
6

Based on Ramnath, approach above, you get the ecdf from ggplot2 by doing the following:

require(ggplot2)
mydf = data.frame(
   a = rnorm(100, 0, 1),
   b = rnorm(100, 2, 1),
   c = rnorm(100, -2, 0.5)
)

mydf_m = melt(mydf)

p0 = ggplot(mydf_m, aes(x = value)) + 
   stat_ecdf(aes(group = variable, colour = variable)) 
print(p0)
vpicaver
  • 1,771
  • 1
  • 15
  • 16
3

Here is one approach

require(ggplot2)
mydf = data.frame(
  a = rnorm(100, 0, 1),
  b = rnorm(100, 2, 1),
  c = rnorm(100, -2, 0.5)
)

mydf_m = melt(mydf)

p0 = ggplot(mydf_m, aes(x = value)) + 
  geom_density(aes(group = variable, colour = variable)) +
  opts(legend.position = c(0.85, 0.85))
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • This is very useful for plotting the density function on the same plot, however I was looking for something similar to :http://mikelove.wordpress.com/category/visualization/page/2/ , the black curve, not the red one. I want the cdf fucntions plotted like you did with the density functions, and I did not manage to do that – agatha Aug 08 '11 at 22:02