3

I have a big multidimensional array and I want it to occupy as little memory as possible. In python, this occupies 66 Mb.

m = np.zeros([1000, 70, 1, 1000], dtype='bool')
size = sys.getsizeof(m)/1024/1024
print("Size: %s MB" % size)

However, in R, the same array occupies 4 times more memory (267Mb).

m <- array(FALSE, dim = c(1000, 70, 1, 1000))
format(object.size(m), units = "auto")

Any idea on how to reduce the array size in R?


EDIT: This array will be used as the X input in an external API. This function takes as argument an array or an internal iterator called mx.io.arrayiter.

Andrie
  • 176,377
  • 47
  • 447
  • 496
hoaphumanoid
  • 977
  • 1
  • 9
  • 25
  • I'm not familiar with python, but [this post](http://stackoverflow.com/questions/11784329/python-memory-usage-of-numpy-arrays) seems to suggest different methods to measure memory usage instead of `getsizeof` – alexis_laz Aug 29 '16 at 10:15
  • @alexis_laz they are the same, I got the exact same size in my example – hoaphumanoid Aug 29 '16 at 10:18
  • 1
    Can you explain what you intend to do with the array? Maybe there are other data structures you could use. The most memory efficient representation of this object needs five values: `FALSE` and the dimensions. – Roland Aug 29 '16 at 10:19
  • E.g., [package slam](https://cran.r-project.org/web/packages/slam/slam.pdf) provides a sparse array class that might be useful. – Roland Aug 29 '16 at 10:22
  • @Roland I updated the question, I have to use the array as an input for an external library. – hoaphumanoid Aug 29 '16 at 10:54
  • Well, then you are out of luck unless you rewrite that package to work with sparse matrices/arrays. – Roland Aug 29 '16 at 11:01
  • :-( Thank you @Roland, I think I will try other solution – hoaphumanoid Aug 29 '16 at 11:04
  • Well, RAM is cheap and available for rent. – Roland Aug 29 '16 at 11:22
  • If I could get more memory that would be awesome, but I'm already using a VM of 200Gb .... – hoaphumanoid Aug 29 '16 at 11:26

1 Answers1

4

Your assertion that these arrays are the same is clearly wrong. If they were the same arrays, then you would need the same memory allocation in R than in any other language.

From the help for ?as.integer:

Note that current implementations of R use 32-bit integers for integer vectors

So clearly the 4x memory usage is because you are using 32-bit objects in R, whereas you are using 8-bit objects in Python.

To use 8-bit objects in R, you can use raw vectors. From the help for ?as.raw:

The raw type is intended to hold raw bytes

Try this:

m3 <- array(raw(0), dim = c(1000, 70, 1, 1000))
format(object.size(m3), units = "auto")

[1] "66.8 Mb"

This is identical to the value you report that Python uses.

Andrie
  • 176,377
  • 47
  • 447
  • 496