0

I have the following dataframe in R:

          2       05 2-ecdf 05-ecdf
1  0.497820 4.333335   0.72    0.92
2  0.515759 4.485323   0.80    0.96
3  0.475414 4.277717   0.44    0.84
4  0.477483 4.153601   0.52    0.68
5  0.460238 3.990691   0.08    0.28
6  0.551006 4.028190   0.96    0.44
7  0.516346 4.085519   0.84    0.48
8  0.551627 3.985772   1.00    0.24
9  0.485307 3.936114   0.60    0.08
10 0.470294 3.942935   0.16    0.12
11 0.475759 4.573081   0.48    1.00
12 0.478644 4.249859   0.56    0.76
13 0.540667 4.147566   0.92    0.64
14 0.473863 4.114151   0.40    0.52
15 0.501908 4.135026   0.76    0.60
16 0.523785 4.295687   0.88    0.88
17 0.471826 3.998803   0.28    0.32
18 0.471295 3.933579   0.20    0.04
19 0.493485 3.955566   0.68    0.20
20 0.492460 3.951763   0.64    0.16
21 0.458762 4.268477   0.04    0.80
22 0.472570 4.023689   0.36    0.40
23 0.471417 4.212704   0.24    0.72
24 0.463172 4.119871   0.12    0.56
25 0.472087 4.012814   0.32    0.36

Each pair of columns (e.g. x and x-ecdf) consists of data and an empirical cumulative density function (ecdf) of that data. What I would like to do is plot both of these ecdf's on a single figure. I was able to do it with a single line (see the figure below), but don't know how I can plot both lines together. An additional difficulty is that I first sort the dataframe based on column x-ecdf before plotting it.

Here is an example of how I create the plots now:

crit <- c("2", "05")
critnames <- c("2%", "0.5%")

# the dataframe shown above is called 'timedf'

for (i in 1:length(crit)) {
    sorted <- timedf[order(timedf[[crit[i]]]), ]
    pdf(paste(filepath, "output", "plots", paste(algorithm, inst, crit[i], ".pdf", sep=""), sep="/"))
    colname <- paste(crit[i], "ecdf", sep="-")
    plot(sorted[[crit[i]]], sorted[[colname]], main=paste("Qualified Runtime Distribution for ", algorithm, " on Instance ", inst, " [", critnames[i], "]", sep=""), xlab="Runtime (s)", ylab="P(solve)", type="o")
    dev.off()
 }

So, how can I plot both of these ecdf's on a single plot?

JNevens
  • 11,202
  • 9
  • 46
  • 72
  • 2
    use `lines` to add additional lines to your plot. – lmo May 04 '16 at 15:24
  • Your example does not work. The `crit` do not belong to the names of the object so `timedf[[crit[i]]]` is always NULL. – Frank May 04 '16 at 15:24
  • @lmo When I do `plot(ecdf(timedf[['2']]), col='red'); lines(ecdf(timedf[['05']]), col='green')`, I only see the red one... – JNevens May 04 '16 at 15:27
  • @lmo Isn't that what I am doing in the previous comment? I do `ecdf` on the original values (`timedf[['2']]`) and plot this. – JNevens May 04 '16 at 15:43
  • My mistake. Let me take a second look. – lmo May 04 '16 at 15:45
  • My guess is that I need to provide a range for the x-axis that is large enough, since the values for `timedf[['2']]` range between 0.4 and 0.6, while the values for `timedf[['05']]` range between 3.8 and 4.5. Only, I can't figure out how to do this. – JNevens May 04 '16 at 15:46
  • that is the issue, these are on totally different scales (roughly an order of magnitude) You can, if you want, get a graph: `plot(ecdf(timedf[['2']]), xlim=c(0,5)); lines(ecdf(timedf[['05']]), col="red")`, but in my mind it doesn't tell you anything that a two by two table of the range of each vector would tell you. – lmo May 04 '16 at 15:57

0 Answers0