9

I'm using numpy to create a cube array with sides of length 100, thus containing 1 million entries total. For each of the million entries, I am inserting a 100x100 matrix whose entries are comprised of randomly generated numbers. I am using the following code to do so:

import random
from numpy import *

cube = arange(1000000).reshape(100,100,100)

for element in cube.flat:
    matrix = arange(10000).reshape(100,100)
    for entry in matrix.flat:
        entry = random.random()*100
    element = matrix

I was expecting this to take a while, but with 10 billion random numbers being generated, I'm not sure my computer can even handle it. How much memory would such an array take up? Would RAM be a limiting factor, i.e. if my computer doesn't have enough RAM, could it fail to actually generate the array?

Also, if there is a more efficient to implement this code, I would appreciate tips :)

aensm
  • 3,325
  • 9
  • 34
  • 44
  • 4
    Assuming `double` precision, at 8 bytes each, if you really are trying to store 10 billion of them, that's 80GB. If you have to ask, your computer doesn't have enough memory. That said, it looks like you're creating them all but not storing them, so you should be fine. – Gabe Jun 28 '12 at 21:56
  • Does this answer your question? [Python memory usage of numpy arrays](https://stackoverflow.com/questions/11784329/python-memory-usage-of-numpy-arrays) – YaOzI Jan 13 '23 at 04:36

2 Answers2

23

A couple points:

  • The size in memory of numpy arrays is easy to calculate. It's simply the number of elements times the data size, plus a small constant overhead. For example, if your cube.dtype is int64, and it has 1,000,000 elements, it will require 1000000 * 64 / 8 = 8,000,000 bytes (8Mb).
  • However, as @Gabe notes, 100 * 100 * 1,000,000 doubles will require about 80 Gb.
  • This will not cause anything to "break", per-se, but operations will be ridiculously slow because of all the swapping your computer will need to do.
  • Your loops will not do what you expect. Instead of replacing the element in cube, element = matrix will simply overwrite the element variable, leaving the cube unchanged. The same goes for the entry = random.rand() * 100.
  • Instead, see: http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
David Wolever
  • 148,955
  • 89
  • 346
  • 502
  • 1
    Things will indeed 'break' if you exceed the total amount of available virtual memory, which seems very likely in this case since not many people have >80GB of swap space – ali_m Jul 30 '15 at 19:15
  • 1
    This is not entirely the case when dealing with functions such as `np.zeros()`. Lazy loading is used (at least in Linux versions), which will avoid using large amounts of memory until certain elements are accessed. For example, you can make a matrix with `np.zeros((24000,24000))`, and it does not take up much memory, but if you do `np.random.random((24000,24000))`, it takes up a bit over 4 GB. Better explanation: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space – Max Candocia Aug 29 '17 at 18:04
2

for the "inner" part of your function, look at the numpy.random module

import numpy as np
matrix = np.random.random((100,100))*100
Phil Cooper
  • 5,747
  • 1
  • 25
  • 41