5

If I use the ecdfplot() function of the latticeExtra package how do I get the actual values calculated i.e. the y-values which correspond to the ~x|g input?

I've been looking at ?ecdfplot but there's not discription to it. For the usual highlevel function ecdf() it works with the command plot=FALSE but this does not work for ecdfplot().

The reason I want to use ecdfplot() rather than ecdf() is that I need to calculate the ecdf() values for a grouping variable. I know I could do this handish too but I'm quite convinced that there is a highroad too.

Here a small expample

u <- rnorm(100,0,1)
mygroup <- c(rep("group1",50),rep("group2",50))
ecdfplot(~u, groups=mygroup)

enter image description here

I would like to extract the y-values given each group for the corresponding x-values.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Druss2k
  • 275
  • 2
  • 5
  • 15
  • Could you include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of `ecdfplot` and how you're using it? It would make your question much easier to answer. – David Robinson Aug 29 '12 at 01:07
  • ok ill edit one in just a second plz – Druss2k Aug 29 '12 at 01:12

2 Answers2

11

If you stick with the ecdf() function in the base package, you can simply do as follows:

  1. Create ecdf function with your data:

    fun.ecdf <- ecdf(x) # x is a vector of your data
    
  2. Now use this "ecdf function" to generate the cumulative probabilities of any vector you feed it, including your original, sorted data:

    my.ecdf <- fun.ecdf(sort(x))
    
Joel S
  • 111
  • 1
  • 3
5

I know you said you don't want to use ecdf, but in this case it is much easier to use it than to get the data out of the trellis object that ecdfplot returns. (After all, that's all that ecdfplot is doing- it's just doing it behind the scenes).

In the case of your example, the following will get you a matrix of the y values (where x is your entire input u, though you could choose a different one) for each ECDF:

ecdfs = lapply(split(u, mygroup), ecdf)
ys = sapply(ecdfs, function(e) e(u))
# output:
#       group1 group2
#  [1,]   0.52   0.72
#  [2,]   0.68   0.78
#  [3,]   0.62   0.78
#  [4,]   0.66   0.78
#  [5,]   0.72   0.80
#  [6,]   0.86   0.94
#  [7,]   0.10   0.26
#  [8,]   0.90   0.94
# ...

ETA: If you just want each column to correspond to the 50 x-values in that column, you could do:

ys = sapply(split(u, mygroup), function(g) ecdf(g)(g))

(Note that if the number of values in each group aren't identical, this will end up as a list rather than a matrix with columns).

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • Thx very much. I probably took the more difficult approach then :) – Druss2k Aug 29 '12 at 01:38
  • 2
    You're very welcome. By the way, if this answered your question you can [accept it as an answer](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work). – David Robinson Aug 29 '12 at 01:39
  • 1
    Not a problem. By the way (and this is not to pressure or scold you), I happened to notice there were a few other questions you asked that had excellent answers that hadn't been accepted. If you do accept them (though you are not obligated to do so), you'll get yourself a few reputation points, reward the people who helped, and ensure that people reading your question in the future know what worked. – David Robinson Aug 29 '12 at 01:45
  • oh ok thats a bad on my part. maybe i can still accept them :) ill have a look. im not so much into point hunting but if others benefit it surely matters. so i accepted all the others which provided me with a good answer :). thx again – Druss2k Aug 29 '12 at 01:48
  • ive a quick question about your answer: ill get for each factor of the grouping variable n y-values corresponding to the ecdf function. but in the example each group only got n1=n2=50 observations. do i know take from the table which you provided that y-value which corresponds to the grouping factor i.e. if x1 is from group1 ill take the left value, if x2 is from group2 ill take the right value? – Druss2k Aug 29 '12 at 02:41
  • See my edit above. There are other ways you could arrange this data- I'm sure you can figure them out based on these examples. – David Robinson Aug 29 '12 at 03:33