2

I have a data frame with 1e7 observations for points with an x and y coordinate. Obviously this would be a bit much to visualize with geom_point, so I'm trying to use geom_density_2d. But this runs into an error:

Warning messages:
1: Computation failed in `stat_density2d()`:
cannot allocate vector of size 2.6 Gb 
2: Computation failed in `stat_density2d()`:
cannot allocate vector of size 2.6 Gb 

What are my options? I can group overlapping points and count them, resulting in a data frame on the order of 1e5 observations, but then I lose a lot of the information for density (I have not been able to find a way to make it recognize the counts for each overlapping point).

How can I use geom_density2d on a data frame of this size?

EDIT: I am trying to avoid the hex and bin_2d geometries.

Matt
  • 954
  • 1
  • 9
  • 24
  • 1
    [See this post](https://stackoverflow.com/questions/1395229/increasing-or-decreasing-the-memory-available-to-r-processes) if you're running on windows or 32-bit R. – edavidaja Aug 20 '18 at 18:01

1 Answers1

2

You could use hexagonal binning:

e <- runif(n = 10000000, -10, 10)
x <- rnorm(n = 10000000, 0, 10)
y <- 1+0.2*x+e
dat <- data.frame(y,x)
ggplot(dat,aes(x=x,y=y)) + stat_binhex()

enter image description here

Or a smoothed plot:

smoothScatter(x=dat$x,y=dat$y)

enter image description here

000andy8484
  • 563
  • 3
  • 16