-2

my dataset has millions of points and it is not a good idea to plot all of them.

runif(10000)->t1
runif(10000)->t3
as.data.frame(cbind(t1,t3))->t4
plot(t4[,1],t4[2])

how can I just plot a sample of the points? I know i could sample from both col but then the problem is that the first col is the x value so I would need to get the same x and y through sampling - or say the same indices. Not sample(t4[,1] and then sample t4[,2]

Is there an easy way to get the 95% ranges of the values plotted into the figure as well? I think a solution with predict would not work too well for me just because the dataset is large and it takes quite long to get through it. It would literally just need to be in a window of 0.1 or something the 95% value plotted on th ebottom and the top

heinheo
  • 557
  • 1
  • 4
  • 15

1 Answers1

1

You can sample the indizes and plot a subset of the total number of points

N<-10000
samplesize<-1000
t4<-data.frame("t1" =runif(N) ,"t3" =  runif(N))
sampleindices <- sample(1:N,samplesize, replace=FALSE)
plot(t4[sampleindices,1],t4[sampleindices,2])

I do not know whether the second part of your question means that you want to plot the 95% quantile as a line?

quantile_t1<- quantile(t4$t1[sampleindices], probs = 0.95)
quantile_t3<- quantile(t4$t3[sampleindices], probs = 0.95)
abline(v = quantile_t1)
abline(h = quantile_t3)

You should also look here: R: Scatterplot with too many points. For me these problems arise when a plot contains so many points that each point does not add any value but the size of the plot increases and R takes forever to complete it. 10000 datapoints should not be a problem at all.

Community
  • 1
  • 1
HOSS_JFL
  • 765
  • 2
  • 9
  • 24
  • it is more that I want to plot a runnign average of the quantile in a 0.1 window so it would look liek the smooth_geom from ggplot2... – heinheo Jun 20 '15 at 08:12