0

I have plotted the CCDF as mentioned in question part of the maximum plot points in R? post to get a plot(image1) with this code:

ccdf<-function(duration,density=FALSE)
{
freqs = table(duration)
  X = rev(as.numeric(names(freqs)))
  Y =cumsum(rev(as.list(freqs)));
  data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(duration),log='xy')

Now, on the basis of answer by teucer on Howto Plot “Reverse” Cumulative Frequency Graph With ECDF I tried to plot a CCDF using the commands below:

f <- ecdf(duration)
plot(1-f(duration),duration)

I got a plot like image2.
Also I read in from the comments in one of the answers in Plotting CDF of a dataset in R? as CCDF is nothing but 1-ECDF.
I am totally confused about how to get the CCDF of my data.

Image1
enter image description here


Image2enter image description here

Community
  • 1
  • 1
user744121
  • 467
  • 2
  • 7
  • 17

2 Answers2

3

Generate some data and find the ecdf function.

x <- rlnorm(1e5, 5)
ecdf_x <- ecdf(x)

Generate vector at regular intervals over range of x. (EDIT: you want them evenly spaced on a log scale in this case; if you have negative values, then use sample over a linear scale.)

xx <- seq(min(x), max(x), length.out = 1e4)
#or
log_x <- log(x)
xx <- exp(seq(min(log_x), max(log_x), length.out = 1e3))

Create data with x and y coordinates for plot.

dfr <- data.frame(
  x = xx,
  ecdf = ecdf_x(xx),
  ccdf = 1 - ecdf_x(xx)
)

Draw plot.

p_ccdf <- ggplot(dfr, aes(x, ccdf)) + 
  geom_line() +
  scale_x_log10()
p_ccdf

(Also take a look at aes(x, ecdf).)

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • At the fourth step while generating interval vector xx shows NaN in all the elements. This happens after executing this command xx <- exp(seq(min(log_x), max(log_x), length.out = 1e3)) Is it because of my data? A sample of my data stored in 'x' is given below
    [99988,] 0 [99989,] 132 [99990,] 19269015 [99991,] 724557277 [99992,] 86783 [99993,] 2407606 [99994,] 20955521 [99995,] 1337 [99996,] 172949 [99997,] 1179731
    – user744121 Jul 13 '11 at 18:36
  • Moreover, it says: NAs introduced by coercion – user744121 Jul 13 '11 at 18:44
  • If you have negative values, then taking logs makes no sense. In that case, use the simpler call to `seq`. It just made slightly more sense to define `xx` with even spacing on a log scale, since `x` came from a lognormal distribution. – Richie Cotton Jul 13 '11 at 21:34
  • I do not have negative values in my data and also my data x has not come from lognormal distribution. Is it still possible that I can still take the log scale? the problem is when calculation xx, taking log(x) is fine. Any suggestions? – user744121 Jul 14 '11 at 06:34
  • @Ritchie- I did not generate vector at regular intervals and just followed other steps: ecdf_x <- ecdf(x) dfr <- data.frame( ecdf = ecdf_x(x), ccdf = 1 - ecdf_x(x) ) p_ccdf <- ggplot(dfr, aes(x, ccdf)) + geom_line() + scale_x_log10() p_ccdf And I think its okay. Now the only thing bothering me is to display fine scale marks on the axes – user744121 Jul 20 '11 at 11:19
1

I used ggplot to get desired ccdf plot of my data as shown below:

>>ecdf_x <- ecdf(x) 
>>dfr <- data.frame( ecdf = ecdf_x(x), 
>>ccdf = 1 - ecdf_x(x) ) 
>>p_ccdf <- ggplot(dfr, aes(x, ccdf)) + geom_line() + scale_x_log10() 
>>p_ccdf

Sorry for posting it so late. Thank you all!

user744121
  • 467
  • 2
  • 7
  • 17