9

I'd like to plot a weighted CDF using ggplot. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.

For example, consider this data:

df <- data.frame(x=sort(runif(100)), w=1:100)

I can show an unweighted CDF with

ggplot(df, aes(x)) + stat_ecdf()

enter image description here

How would I weight this by w? For this example, I'd expect an x^2-looking function, since the larger numbers have higher weight.

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132

1 Answers1

11

There is a mistake in your answer.

This is the right code to compute the weighted ECDF:

df <- df[order(df$x), ]  # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(w) / sum(w))
ggplot(df, aes(x, cum.pct)) + geom_line()

The ECDF is a function F(a) equal to the sum of weights (probabilities) of observations where x<a divided by the total sum of weights.

But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted

NicolasWoloszko
  • 379
  • 4
  • 6
  • Hi, the code in the github repo looks very interesting. could you add some guidelines how to safely "install/uninstall" it? thanks – vagvaf Sep 16 '20 at 10:11
  • 1
    @vagvaf, I'm not very experienced at this, but it just looks like an .R file to me, so I think you'll just have to copy the code after loading the `ggplot2` library (which will overwrite the `stat_ecdf` function). I think packages would be installed from GitHub with commands such as `library(devtools)`, then `install_github("NicolasWoloszko/stat_ecdf_weighted")`. – ahorn Oct 06 '20 at 10:06
  • 1
    The simplest way is to source the raw URL. Simply type `source("https://raw.githubusercontent.com/NicolasWoloszko/stat_ecdf_weighted/master/stat_ecdf_weighted.R")`. You can type `stat_ecdf` (without the parentheses) to see how the function is defined *before* and after you call Nicolas's script. – ahorn Oct 06 '20 at 10:12