Questions tagged [ecdf]

Empirical Cumulative Distribution Function in statistics

For definition please see its Wikipedia page.

In software, a built-in function ecdf takes a vector of samples and generates its ECDF. It is also easy to produce it ourselves, as given in this example: How to derive an ecdf function?

162 questions
73
votes
18 answers

How to plot empirical cdf (ecdf)

How can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function. One thing I can think of is: from scipy.stats import cumfreq a = array([...]) # my array of numbers num_bins…
user248237
59
votes
2 answers

Plot CDF + cumulative histogram using Seaborn

Is there a way to plot the CDF + cumulative histogram of a Pandas Series in Python using Seaborn only? I have the following: import numpy as np import pandas as pd import seaborn as sns s = pd.Series(np.random.normal(size=1000)) I know I can plot…
Michael
  • 1,834
  • 2
  • 20
  • 33
12
votes
2 answers

Reliably retrieve the reverse of the quantile function

I have read other posts (such as here) on getting the "reverse" of quantile -- that is, to get the percentile that corresponds to a certain value in a series of values. However, the answers don't give me the same value as quantile for the same…
9
votes
3 answers

R: Plotting one ECDF on top of another in different colors

I have a couple of cumulative empirical density functions which I would like to plot on top of each other in order to illustrate differences in the two curves. As was pointed out in a previous question, the function to draw the ECDF is simply…
JD Long
  • 59,675
  • 58
  • 202
  • 294
8
votes
4 answers

How to plot multiple ECDF's on one plot in different colors in R

I am trying to plot 4 ecdf functions on one plot but can't seem to figure out the proper syntax. If I have 4 functions "A, B, C, D" what would be the proper syntax in R to get them to be plotted on the same chart with different colors. Thanks!
Jason
  • 123
  • 1
  • 1
  • 7
7
votes
1 answer

In R ggplot2, include stat_ecdf() endpoints (0,0) and (1,1)

I'm trying to use stat_ecdf() to plot cumulative successes as a function of a rank score created by a predictive model. #libraries require(ggplot2) require(scales) # fake data for reproducibility set.seed(123) n <- 200 df <- data.frame(model_score=…
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
7
votes
1 answer

Python Empirical distribution function (ecdf) implementation

I am aware of statsmodels.tools.tools.ECDF but since the calculation of an empricial cumulative distribution function (ECDF) is pretty straight-forward and I want to minimise dependencies in my project, I want to code it manually. In a given list()…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
6
votes
1 answer

quantile vs ecdf results

I am trying to use ecdf, but I am not sure if I am doing it right. My ultimate purpose is to find what quantile corresponds to a specific value. As an example: sample_set <- c(20, 40, 60, 80, 100) # Now I want to get the 0.75 quantile: quantile(x =…
Max_IT
  • 602
  • 5
  • 15
6
votes
2 answers

How to draw multiple CDF plots of vectors with different number of rows

I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code: library("ggplot2") a1 <- rnorm(1000, 0, 3) a2 <- rnorm(1000, 1, 4) a3 <-…
Excalibur
  • 431
  • 6
  • 19
6
votes
3 answers

What is the fastest way to obtain frequencies of integers in a vector?

Is there a simple and fast way to obtain the frequency of each integer that occurs in a vector of integers in R? Here are my attempts so far: x <- floor(runif(1000000)*1000) print('*** using TABLE:') system.time(as.data.frame(table(x))) print('***…
Museful
  • 6,711
  • 5
  • 42
  • 68
6
votes
2 answers

how to specify color of lines and points in ecdf ggplot2

I have a set of data that is tough to visualize, but I think an ECDF with a couple of points and lines added to it will do the trick. I am able to plot things the way that I want; my problem is coloring things correctly. I have the following code,…
RyanStochastic
  • 3,963
  • 5
  • 17
  • 24
5
votes
3 answers

How to plot reverse (complementary) ecdf using ggplot?

I currently use stat_ecdf to plot my cumulative frequency graph. Here is the code I used cumu_plot <- ggplot(house_total_year, aes(download_speed, colour = ISP)) + stat_ecdf(size=1) However I want the ecdf to be…
Tara Sutjarittham
  • 366
  • 1
  • 6
  • 18
5
votes
1 answer

Turn off dotted lines in plot.ecdf()

Plotting an ecdf object in R produces a nice empirical distribution function. E.g: x = seq(1,10,1) ecdf1 = ecdf(x) plot(ecdf1,verticals=TRUE, do.points=FALSE) However, the default behavior produces a figure with horizontal dotted lines at 0 and 1.…
Devon
  • 650
  • 8
  • 19
5
votes
2 answers

Get data associated to ggplot + stat_ecdf()

I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table? Please have a look to the…
cho7tom
  • 1,030
  • 2
  • 13
  • 30
5
votes
2 answers

How do I extract ecdf values out of ecdfplot()

If I use the ecdfplot() function of the latticeExtra package how do I get the actual values calculated i.e. the y-values which correspond to the ~x|g input? I've been looking at ?ecdfplot but there's not discription to it. For the usual highlevel…
Druss2k
  • 275
  • 2
  • 5
  • 15
1
2 3
10 11