-2

I have an R matrix which is very data dense. It has 500,000 rows. If I plot 1:500000 (x axis) to the third column of the matrix mat[, 3] it takes too long to plot, and sometimes even crashes. I've tried plot, matplot, and ggplot and all of them take very long.

I am looking to group the data by 10 or 20. ie, take the first 10 elements from the vector, average that, and use that as a data point.

Is there a fast and efficient way to do this?

masfenix
  • 7,736
  • 11
  • 45
  • 60
  • You should show some of the code you tried rather than just saying "I tried and it crashes". This shouldn't crash - it shouldn't be a problem at all. – Gregor Thomas Aug 16 '16 at 18:12

1 Answers1

1

We can use cut and aggregate to reduce the number of points plotted:

generate some data

set.seed(123)
xmat <- data.frame(x = 1:5e5, y = runif(5e5))

use cut and aggregate

xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
xmat.agg <- aggregate(y ~ cutx, data = xmat, mean)

make plot

plot(xmat.agg, pch = ".")

enter image description here

more than 1 column solution:

Here, we use the data.table package to group and summarize:

generate some more data

set.seed(123)
xmat <- data.frame(x = 1:5e5, 
                   u = runif(5e5),
                   z = rnorm(5e5),
                   p = rpois(5e5, lambda = 5),
                   g = rbinom(n = 5e5, size = 1, prob = 0.5))

use data.table

library(data.table)
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
setDT(xmat) #convert to data.table
#for each level of cutx, take the mean of each column
xmat[,lapply(.SD, mean), by = cutx] -> xmat.agg

#  xmat.agg
#         cutx        x         u            z   p   g
#     1:     1      5.5 0.5782475  0.372984058 4.5 0.6
#     2:     2     15.5 0.5233693  0.032501186 4.6 0.8
#     3:     3     25.5 0.6155837 -0.258803746 4.6 0.4
#     4:     4     35.5 0.5378580  0.269690334 4.4 0.8
#     5:     5     45.5 0.3453964  0.312308395 4.8 0.4
#    ---                                              
# 49996: 49996 499955.5 0.4872596  0.006631221 5.6 0.4
# 49997: 49997 499965.5 0.5974486  0.022103345 4.6 0.6
# 49998: 49998 499975.5 0.5056578 -0.104263093 4.7 0.6
# 49999: 49999 499985.5 0.3083803  0.386846148 6.8 0.6
# 50000: 50000 499995.5 0.4377497  0.109197095 5.7 0.6

plot it all

par(mfrow = c(2,2))
for(i in 3:6) plot(xmat.agg[,c(1,i), with = F], pch = ".")

enter image description here

bouncyball
  • 10,631
  • 19
  • 31