3

How does R (base, lattice or whatever) create a graph from a 100000 elements vector (or a function that outputs that values)? Does it plot some and reject others? plot all on top of each other? How can I change this behaviour?

How could I crate a graph where for every interval I see the max and min values, as in the trading "bar" charts? (or any other idea to visualize that much info without needing to previously calculate intervals, mins and maxs myself nor using financial pakages)

How could I create a large "horizontally scrolleable" plot?

For example I want to plot the first 100000 iterations

zz <- (zz^2+1) %% nn     

starting at zz=1, nn = 10^7+1 The x axis would be just the iteration number.

Summarizing. I want to plot a the output of a function that is sometimes soft but sometimes very spiky, over a very large interval. That spikes are very important.

regards

Community
  • 1
  • 1
skan
  • 7,423
  • 14
  • 59
  • 96
  • http://stackoverflow.com/questions/7714677/r-scatterplot-with-too-many-points/ ; http://stackoverflow.com/questions/10945707/speed-up-plot-function-for-large-dataset – Ben Bolker Apr 09 '13 at 12:28

4 Answers4

6

You mention tha tyou sometimes have spikes which are vey important.

See below how I plot ping results, where the vast majority of data is in the milliseconds, but the spikes are important for me as well:

ping

Basically, I hexbin all data points with response time < 500 ms, and plot points for all longer response times. 5s response time is additionally marked as timeout:

ggplot (df, aes (x = date, y = t5)) + 
        stat_binhex (data = df [df$t5 <= 0.5,], bins = nrow (df) / 250) +
        geom_point (data = df [df$t5 > 0.5,], aes (col = type), shape = 3) +
        ylim (c (0, 5)) +
        scale_fill_gradient (low = "#AAAAFF", high = "#000080") +
        scale_colour_manual ("response type", 
                             values = c (normal = "black", timeout = "red")) + 
        ylab ("t / s") 

I think I already posted this as a solution to a similar question, but I couldn't find it.

cbeleites unhappy with SX
  • 13,717
  • 5
  • 45
  • 57
2

If R can produce the plot, it will simply plot the points, even if they are on top of each other. In general, such a large number of points is not really useful to plot, and not necessary. Some strategies to deal with this are:

  • Subsample, say, 2% of the data and plot it. Repeat this several times to see if the outcome changes
  • Don't plot the raw data, but aggregate first. Think of calculating a temporal mean, binning data first, etc.
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Yes. I was looking for a plot package or function that could do it automatically. Anyway, if I aggreate the data calculating the maximum and minimum of every 300 chunk... What plot function would you use to plot that maxs and mins without using financial packages (and without converting the vector to a time series). – skan Apr 09 '13 at 16:52
2

R will plot all the points and things might look cluttered.

This is a new package, but check out Hadley's bigvis package

Nishanth
  • 6,932
  • 5
  • 26
  • 38
  • I've just seen it at Revolution Analytics Blog. Thanks. I have to try it. It seems that can handle large quantity of data and manipulate it but not create a scrollable graph. – skan Apr 09 '13 at 16:48
  • I've tried to install bigvis but the installation produces an error "Error: command failed (1)" and the path doesn't contain any space. – skan Apr 10 '13 at 23:03
1

curvemight be a nice way to go here:

f <- function(x){(x^2+1)%%(1+1e7)}
curve(f, from=1, to=1e5)

enter image description here

plannapus
  • 18,529
  • 4
  • 72
  • 94