3

I am working with very large data sets named u10 and u10s consisting of 30016950 values.

Now what I am trying to do is a contour plot to aid in the comparison of the two (a scatter plot is just too messy).

I've looked around and found code such as;

x<-u10
y<-u10s
zfunc <- function (x,y) {
1/2/pi*exp(-(x^2+y^2)/2)
}
z<-outer(x,y,zfunc)

contour(x,y,z,nlevels=10)

However,

Error: cannot allocate vector of size 6835453.9 Gb, 
Reached total allocation of 4025Mb: see help(memory.size)

This works with a small data set but unfortunately not a large one like mine.

I have also tried using kde2d which again works with a small data set but not mine since it produces the same error.

Any hints on how I would go about producing such a contour/filled.contour plot successfully in R/splus?

Community
  • 1
  • 1
  • 2
    maybe use a high-resolution `hexbin` plot and then base the contours on the hexbinned data? – Ben Bolker Jan 23 '14 at 02:38
  • perhaps using summary statistics could help? When dealing with such a dataset, one option is to aggregate (converting your numeric variables into factors). Another option is to subset your data, and plot a few subsets to see if the sample is representative. A 10% of your data should do. [Check out the ggplot documentation for this](http://docs.ggplot2.org/current/) – marbel Jan 23 '14 at 04:08
  • 6
    Do you understand why `outer` wanted to use 6.8Pb - yes Petabytes - of memory? It was trying to create a vector for each x for each y - about 9x10^14 values. – Spacedman Jan 23 '14 at 09:12
  • @Spacedman `outer` actually needs to create two vectors of that size and pass them to `zfunc`. – Roland Jan 23 '14 at 10:01
  • 1
    @Roland I suspect it failed the first time :) – Spacedman Jan 23 '14 at 10:18
  • @Spacedman buy more RAM? – Simon O'Hanlon Jan 23 '14 at 15:08
  • @BenBolker Thanks! I didn't know there was such a plot and that suits my need quite nicely. – PhillipPhillipson Jan 25 '14 at 14:12

0 Answers0