1

Why does my matrix doubles in size if I replace values in it? Can I prevent R from doing so? Example:

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
object.size(a)/1024/1024
# 0.038 Mb
# I want to have a mean smaller than 1 in every column
# Thus, swap 0's and 2' in every column where mean is bigger than 1
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
tracemem(a)
# [1] "<0x7fe9d2f16f50>"
a[,swapcol][swapmat==2] <- 0
# tracemem[0x7fe9d2f16f50 -> 0x7fe9c2d98b90]: 
# tracemem[0x7fe9c2d98b90 -> 0x7fe9c2d2bf70]: 
a[,swapcol][swapmat==0] <- 2
# tracemem[0x7fe9c2d2bf70 -> 0x7fe9c2e1b460]: 
object.size(a)/1024/1024
# 0.076 Mb, memory occupation doubled

I understand that the matrix maybe gets copied in order to replace the value, but why does it get bigger? (replace() results in the same behaviour) I read the chapter of Hadley's book about Memory usage and the R Documentation to this question but I am still wondering why this is happening. I thought maybe R demands a bit more space from the OS in case I want to enlarge the matrix, but why twice the space? This is even true (with the same factor) for big matrices, making my system swapping memory (thus contradicting a potential time saving effect).

Thanks for any hints!

Andarin
  • 799
  • 1
  • 8
  • 17
  • 4
    `0` and `2` are floats (i.e. doubles). Your matrix contains integers. Use `0L` and `2L` to force R to treat them as integers. – joran May 19 '14 at 17:47
  • Thanks! It worked. Yet kinda unexpected... A lot to learn I still have. If you formulate it in a very short answer, I'll accept it. – Andarin May 19 '14 at 17:53
  • 2
    You also might be interested in `typeof()`, as in `typeof(swapmat)` and `typeof(4); typeof(4L); typeof(1:4)`, etc. – Josh O'Brien May 19 '14 at 17:57
  • ...or `storage.mode()` – Gavin Simpson May 19 '14 at 17:59
  • 1
    @GavinSimpson -- Yup. Posted that before I saw your answer. Seeing the two side-by-side (and wondering again what the difference between the two of them is) did inspire me to go revisit [this fairly interesting answer](http://stackoverflow.com/questions/8855589/a-comprehensive-survey-of-the-types-of-things-in-r-mode-and-class-and-type/8857411#8857411) from a while back. – Josh O'Brien May 19 '14 at 18:01
  • 1
    @Andarin R has to assume something about the values because we don't type objects in R when we create them, unlike some other languages. The convention is to assume doubles probably because for statistics one is likely to be doing computations that require he data to be doubles in the various routines. – Gavin Simpson May 19 '14 at 18:01
  • @JoshO'Brien Oh no! I'm being old-fashioned! ;-) – Gavin Simpson May 19 '14 at 18:02

2 Answers2

4

The problem is that 0, 2 etc are not integers but doubles as far as R is concerned and when you assign them to the matrix a's elements you force R to store the modified a using doubles which increases the object's memory size. The original a was stored using integers, which take up less memory each. You can see this via storage.mode():

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)

> storage.mode(a)
[1] "integer"

swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0
a[,swapcol][swapmat==0] <- 2

> storage.mode(a)
[1] "double"
> format(object.size(a), units = "Kb")
[1] "78.3 Kb"

To fix this, append L to the values you assign to a; this is R's notation for an integer.

set.seed(42)
a <- matrix(rbinom(10000,2,0.45),ncol=10)
swapcol <- colMeans(a)>1
swapmat <- a[,swapcol]
a[,swapcol][swapmat==2] <- 0L
a[,swapcol][swapmat==0] <- 2L

> storage.mode(a)
[1] "integer"
> format(object.size(a), units = "Kb")
[1] "39.3 Kb"
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thank you for the very elaborated answer! (and the hint about `format(object.size)`...) I thought it would behave more like C and cast to `double` only if necessary (and 2 being `int`), though I see the point of R treating everything as `double`. Yet that my binomial-matrix is created as `int` but not a vector of elementary `int`s caught me by surprise (e.g. `typeof(rbinom(3,1,1))` vs `typeof(rep(1,3))`). – Andarin May 19 '14 at 18:27
3

Converting comment to answer:

0 and 2 are floats (i.e. doubles). Your matrix contains integers. Use 0L and 2L to force R to treat them as integers:

set.seed(42)
> a <- matrix(rbinom(10000,2,0.45),ncol=10)
> object.size(a)/1024/1024
0.0383377075195312 bytes

> swapcol <- colMeans(a)>1
> swapmat <- a[,swapcol]
> tracemem(a)
[1] "<0x7fc50ec45e00>"
> a[,swapcol][swapmat==2] <- 0L
tracemem[0x7fc50ec45e00 -> 0x7fc50d839e00]: 

> a[,swapcol][swapmat==0] <- 2L
> object.size(a)/1024/1024
0.0383377075195312 bytes

Same size!

Josiah Yoder
  • 3,321
  • 4
  • 40
  • 58
joran
  • 169,992
  • 32
  • 429
  • 468