14

I'm trying to plot a distribution CDF using R and ggplot2. However, I am finding difficulties in plotting the CDF function after I transform the Y axis to obtain a straight line. This kind of plot is frequently used in Gumbel paper plots, but here I'll use as example the normal distribution.

I generate the data, and plot the cumulative density function of the data along with the function. They fit well. However, when I apply an Y axis transformation, they don't fit anymore.

sim <- rnorm(100) #Simulate some data
sim <- sort(sim)  #Sort it

cdf <- seq(0,1,length.out=length(sim)) #Compute data CDF

df <- data.frame(x=sim, y=cdf) #Build data.frame

library(scales)
library(ggplot2)

#Now plot!
gg <- ggplot(df, aes(x=x, y=y)) +
        geom_point() +
        stat_function(fun = pnorm, colour="red")
gg

And the output should be something on the lines of: enter image description here Good!

Now I try to transform the Y axis according to the distribution used.

#Apply transformation
gg + scale_y_continuous(trans=probability_trans("norm"))

And the result is: enter image description here

The points are transformed correctly (they lie on a straight line), but the function is not!

However, everything seems to work fine if I do like this, calculating the CDF with ggplot:

ggplot(data.frame(x=sim), aes(x=x)) +
  stat_ecdf(geom = "point") +
  stat_function(fun="pnorm", colour="red") +
  scale_y_continuous(trans=probability_trans("norm"))

The result is OK: This wokrs OK

Why is this happening? Why doesn't calculating the CDF manually work with scale transformations?

AF7
  • 3,160
  • 28
  • 63

1 Answers1

10

This works:

gg <- ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  stat_function(fun ="pnorm", colour="red", inherit.aes = FALSE) +
  scale_y_continuous(trans=probability_trans("norm"))
gg

enter image description here

Possible explanation:

Documentation States: inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

My guess: As scale_y_continuous changes the aesthetics of the main plot, we need to turn off the default inherit.aes=TRUE. It seems inherit.aes=TRUE in stat_function picks its aesthetics from the first layer of the plot, and so the scale transformation does not impact unless specifically chosen to.

Divi
  • 1,614
  • 13
  • 23
  • Thank you. Do you have an hypothesis on why using `geom_ecdf()` works even without `inherit.aes`? – AF7 May 18 '16 at 12:37
  • 1
    `stat_ecdf` has no aesthetics inheritance structure, the only option is to override layer aesthetics by overriding that very layer. `stat_function` on the other hand _superimposes_ a function on the plot layer, and `inherit.aes=TRUE` (the default) picks aesthetic mappings from the top layer of the plot. What gave away the actual problem to me was the _superimpose_ in `stat_function`. Seems to me that `stat_function` was designed to follow the mappings of the actual plot you construct (top layer) without getting affected from all lower layer changes to the aesthetic mappings. – Divi May 18 '16 at 13:54