17

I have a 1000*1000 matrix (which only includes integer 0 and 1), but when I tried to make a heatmap, an error occurs because it is too large.

How can I create a heatmap with such a large matrix?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
question
  • 487
  • 3
  • 8
  • 13
  • Plenty of answers about heatmap. http://stackoverflow.com/questions/3789549/display-a-matrix-including-the-values-as-a-heatmap http://stackoverflow.com/questions/5035491/how-to-put-black-borders-in-heatmap-in-r Try searching `[r] heatmap`. – Roman Luštrik Apr 14 '11 at 17:28
  • Please copy and paste the exact error. 1000x1000 shouldn't produce a distance matrix too large for R. – Vince Apr 14 '11 at 17:31
  • 1
    @Roman but this is neither of those questions... – Vince Apr 14 '11 at 17:32
  • Try `image(m)` after doing whatever re-ordering on rows and cols needed ? – Martin Morgan Apr 14 '11 at 18:21
  • Posted a solution using heatmap3, which is more memory efficient, especially through it's use of the fastcluster package to do the hierarchical clustering; adding argument useRaster=TRUE also helps – Tom Wenseleers Aug 20 '15 at 12:30
  • There is advice in [this SO question](https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session) about R memory management. If you can't allocated a 1000 by 1000 image, then you should probably stop trying to do stats on your mobile phone. – Richie Cotton Apr 14 '11 at 17:34

5 Answers5

18

I can believe that the heatmap is, at least, taking a long time, because heatmap does a lot of fancy stuff that takes extra time and memory. Using dat from @bill_080's example:

## basic command: 66 seconds
t0 <- system.time(heatmap(dat))
## don't reorder rows & columns: 43 seconds
t1 <- system.time(heatmap(dat,Rowv=NA))
## remove most fancy stuff (from ?heatmap): 14 seconds
t2 <- system.time( heatmap(dat, Rowv = NA, Colv = NA, scale="column",
             main = "heatmap(*, NA, NA) ~= image(t(x))"))
## image only: 13 seconds
t3  <- system.time(image(dat))
## image using raster capability in R 2.13.0: 1.2 seconds
t4 <- system.time(image(dat,useRaster=TRUE))

You might want to consider what you really want out of the heatmap -- i.e., do you need the fancy dendrogram/reordering stuff?

Argalatyr
  • 4,639
  • 3
  • 36
  • 62
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
11

No errors when I try it. Here's the code:

 library(lattice)

 #Build the data
 nrowcol <- 1000
 dat <- matrix(ifelse(runif(nrowcol*nrowcol) > 0.5, 1, 0), nrow=nrowcol)

 #Build the palette and plot it
 pal <- colorRampPalette(c("red", "yellow"), space = "rgb")
 levelplot(dat, main="1000 X 1000 Levelplot", xlab="", ylab="", col.regions=pal(4), cuts=3, at=seq(0,1,0.5))

enter image description here

bill_080
  • 4,692
  • 1
  • 23
  • 30
  • I was able to go up to about a 2300X2300 plot. A 2400X2400 plot gave "Error using packet 1 cannot allocate vector of size 22.0 Mb" at the levelplot() statement. – bill_080 Apr 14 '11 at 19:04
  • For me this only worked for larger matrices with option useRaster=TRUE, ie levelplot(dat, main="1000 X 1000 Levelplot", xlab="", ylab="", col.regions=pal(4), cuts=3, at=seq(0,1,0.5), useRaster=TRUE) ; otherwise even with a 5000 x 5000 matrix it would end up allocating about 6 Gb of memory - not good!! – Tom Wenseleers Aug 20 '15 at 12:06
  • Plus this is a levelplot, not a heatmap with row and/or column hierarchical clustering included, which makes a big difference... – Tom Wenseleers Aug 20 '15 at 12:33
8

try the raster package, it can handle huge raster file.

dickoa
  • 18,217
  • 3
  • 36
  • 50
5

Using heatmap3, which is more memory efficient than the default heatmap function and faster through it's use of the fastcluster package to do the hierarchical clustering works fine for me. Adding argument useRaster=TRUE also helps :

library(heatmap3)
nrowcol <- 1000
dat <- matrix(ifelse(runif(nrowcol*nrowcol) > 0.5, 1, 0), nrow=nrowcol)
heatmap3(dat,useRaster=TRUE)

enter image description here

The useRaster=TRUE seems quite important to keep memory use within limits. You can use the same argument in heatmap.2. Calculating the distance matrix for the hierarchical clustering is the main overhead in the calculation, but heatmap3 uses the more efficient fastcluster package for that for large matrices. With very large matrices you will unavoidably get into trouble though trying to do a distance-based hierarchical cluster. In that case you can still use arguments Rowv=NA and Colv=NA to suppress the row and column dendrograms and use some other logic to sort your rows and columns, e.g.

nrowcol <- 5000
dat <- matrix(ifelse(runif(nrowcol*nrowcol) > 0.5, 1, 0), nrow=nrowcol)
heatmap3(dat,useRaster=TRUE,Rowv=NA,Colv=NA)

still runs without problems on my laptop with 8 Gb memory, whereas with the dendrograms included it already starts to crunch.

Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103
1

You can also use heatmap.2 from the gplots package and simply turn off dendrograms, as these normally take up the most computation time (from my experience).

Also, have you considered directly printing your heatmap to a file via pdf(), png() or jpeg()?

bontus
  • 507
  • 1
  • 6
  • 9