8

Why do the following plots look different? Both methods appear to use Gaussian kernels.

How does ggplot2 compute a density?

library(fueleconomy)

d <- density(vehicles$cty, n=2000)
ggplot(NULL, aes(x=d$x, y=d$y)) + geom_line() + scale_x_log10()

enter image description here

ggplot(vehicles, aes(x=cty)) + geom_density() + scale_x_log10()

enter image description here


UPDATE:

A solution to this question already appears on SO here, however the specific parameters ggplot2 is passing to the R stats density function remain unclear.

An alternate solution is to extract the density data straight from the ggplot2 plot, as shown here

Community
  • 1
  • 1
Megatron
  • 15,909
  • 12
  • 89
  • 97
  • Thanks for the reference. However, the solution doesnt appear to identify the explicit parameter differences. I'm wondering how I can generate/extract the precise density data from the ggplot density. – Megatron Apr 21 '16 at 23:16
  • 1
    This seems to extract the exact values geom_density plots: http://stackoverflow.com/questions/12394321/r-what-algorithm-does-geom-density-use-and-how-to-extract-points-equation-of – fanli Apr 21 '16 at 23:18
  • I dont think this is to do with the density but how you are applying the log tranform – user20650 Apr 22 '16 at 01:20
  • Is there an alternate log transformation that I can apply to render them identical? – Megatron Apr 22 '16 at 01:22
  • 1
    eg try `d2 <- density(log10(vehicles$cty), from=min(log10(vehicles$cty)), to=max(log10(vehicles$cty))) ; ggplot(data.frame(x=d2$x, y=d2$y), aes(x=x, y=y)) + geom_line()` : but youll need to tweak the axis labels. Ans`ggplot(vehicles, aes(x=cty)) + stat_density(geom="line") + scale_x_log10()` – user20650 Apr 22 '16 at 01:24
  • Looks good! Care to turn it into an answer for posterity? – Megatron Apr 22 '16 at 01:26
  • consider switching to `ggalt::geom_bkde()` for better density estimates. – hrbrmstr Apr 22 '16 at 01:49

1 Answers1

3

In this case, it is not the density calculation that is different but how the log10 transform is applied.

First check the densities are similar without transform

library(ggplot2)
library(fueleconomy)

d <- density(vehicles$cty, from=min(vehicles$cty), to=max(vehicles$cty))
ggplot(data.frame(x=d$x, y=d$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line")

So the issue seems to be the transform. In the stat_density below, it seems as if the log10 transform is applied to the x variable before the density calculation. So to reproduce the results manually you have to transform the variable prior to the calculating the density. Eg

d2 <- density(log10(vehicles$cty), from=min(log10(vehicles$cty)), 
                                               to=max(log10(vehicles$cty)))
ggplot(data.frame(x=d2$x, y=d2$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line") + scale_x_log10()

PS: To see how ggplot prepares the data for the density, you can look at the code as.list(StatDensity) leads to StatDensity$compute_group to ggplot2:::compute_density

user20650
  • 24,654
  • 5
  • 56
  • 91