1

I have two sets of 100.000 observations that come from a simulation. Since one of the two cases is a 'baseline' case and the other is a 'treatment' case, I want create a plot that highlights the difference in distribution of the two simulations.

I started with an ecdf() of the two populations. The result is in the picture. the two ecdf()s I wish to combine into ONE 'difference' plot

What I would like to do is to have a plot of the difference between the two ecdf curves.

A simple ecdf(baseline) - ecdf(treatment) does not work since ecdf returns a function; even using Ecdf from the Hmisc package does not work, since Ecdf returns a list and again the differene '-' operator is ill-defined in such a case.

By running this code you can get to the scenario described by the picture above

a <- runif(10000)
b <- rnorm(10000,0.5,0.5)
plot(ecdf(a))
lines(ecdf(b), col='red')

Any hints would be more than welcome.

PaoloCrosetto
  • 600
  • 1
  • 7
  • 16
  • If you actually include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), it would be easier to offer specific coding suggestions. – MrFlick Sep 29 '14 at 16:56
  • @MrFlick thanks, I did add the reproducible example for the starting point. I cannot give more than that since the rest is exactly what I am asking for... – PaoloCrosetto Sep 29 '14 at 17:03
  • That's all that's necessary. If that's the case then Neal's answer should work. I've added a comment to his answer to show how it would work with your variable names (something he could have done had the example been included initially). – MrFlick Sep 29 '14 at 17:09

1 Answers1

2

So evaluate the functions?

decdf <- function(x, baseline, treatment)  ecdf(baseline)(x) - ecdf(treatment)(x)
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60
  • I treid but it returns an error -> it does not seem to be able to use the operator '-' with two ecdf()s. – PaoloCrosetto Sep 29 '14 at 17:05
  • 1
    @PaoloCrosetto With your sample data, did you try `curve(decdf(x,a,b), from=min(a,b), to=max(a,b))`? That seems to work for me – MrFlick Sep 29 '14 at 17:07
  • Thanks, it works. I am not very familiar with function declarations in R, and I did not know what the 'x' in the function stood for. I still do not know, but I should RTFM I suppose. Thanks! – PaoloCrosetto Sep 29 '14 at 17:15
  • @Neal Fultz I have an additional question. What if I tried to apply the same concept to pdf instead of cdf -> i.e., to density(a) and density(b)? I tried to change your code but it did not work as intended. Thanks! – PaoloCrosetto Sep 29 '14 at 18:16
  • 1
    You can use the fact that the pdf is the derivative of the CDF, so difference in pdfs is the derivative of diff of cdfs; check `?numericDeriv`. – Neal Fultz Sep 29 '14 at 18:29