5

I was doing a foolish thing like:

from itertools import *
rows = combinations(range(0, 1140), 17)
all_rows = []
for row in rows:
    all_rows.append(row)

No surprise; I run out of memory address space (32 bit python 3.1) My question is: how do I calculate how much memory address space I will need for a large list? In this case the list is on the order of 2.3X10^37. Is there a function in Python that returns the information I am looking for, or actually the size of a smaller but similar list? What are those tools?

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
Vincent
  • 1,579
  • 4
  • 23
  • 38
  • Is not is just 1140 choose 17 ? – Hamish Grubijan Dec 31 '09 at 16:19
  • 3
    10^37? Really? That's about 125 bits of addressing space to simply count the elements. You have to rethink this from the fundamentals. Storage isn't really the issue here. You need to do something other than exhaustively enumerate all combinations. – S.Lott Dec 31 '09 at 16:21
  • by the way, try all_rows = list(combinations(range(0, 1140), 17)) - should cut memory usagea bit. – Hamish Grubijan Dec 31 '09 at 16:21
  • 1
    BTW. 64 bits only allows you to count to 10^19. – S.Lott Dec 31 '09 at 16:23
  • 4
    Vincent's question makes it clear that he knows this is "foolish", and that his real interest is in how to calculate memory consumption, specifically what tools are available. That's a perfectly valid, well-stated question. – Peter Hansen Dec 31 '09 at 16:45
  • @Peter Hansen: Interesting read of the question. I take it as a failure to appreciate the magnitude of what's going on. The "out of memory" is only a symptom of a very real problem. The underlying problem is a failure to consider the vast scale of this approach. The memory use answer is irrelevant because this approach is unsound even if each item occupied only one bit. – S.Lott Dec 31 '09 at 19:40
  • @ Peter Hanson, It is true I would like to be able to do this but more importantly I would like to know the limit of such thing within the context of memory. @S.Lott, I am sure I am running out out, or more correctly that is the error I get. Rather that finding the combination of the integers 0-1140, I actually have 1140 lists of length 17. But I get the same error. – Vincent Dec 31 '09 at 22:42
  • @Vincent: "I actually have 1140 lists of length 17. But I get the same error." 1140 lists of length 17 does not exceed memory. Even my little MacBook easily handles this. You should probably accept the answer to this question and then move on to ask your new question with new code. – S.Lott Dec 31 '09 at 23:42
  • @Vincent, not yet clear, if you actually want the list of combinations, is why you would need the full list, in memory, all at once, instead of simply iterating over it as it is generated. There's a good reason `combinations()` resides in the itertools module... – Peter Hansen Jan 01 '10 at 15:48
  • @S.Lott, "1140 lists of length 17 does not exceed memory" What I meant was I have 1140 list of length 17 that I would like all combination of. – Vincent Jan 01 '10 at 18:05
  • @Peter Hansen, "why you would need the full list, in memory" Great Question. In the for loop I would actually perform several test to determine if I wan to keep the combination. I still might/will end up with a large list. What are my option if I don't keep it in memory? The only option I know of it to write them to a file or database? – Vincent Jan 01 '10 at 18:16
  • I ask a new question based on the comments, Thanks for the useful Comments http://stackoverflow.com/questions/1989251/alternatives-to-keeping-large-lists-in-memory-python – Vincent Jan 01 '10 at 18:34

4 Answers4

11

There's a handy function sys.getsizeof() (since Python 2.6) that helps with this:

>>> import sys
>>> sys.getsizeof(1)  # integer
12
>>> sys.getsizeof([]) # empty list
36
>>> sys.getsizeof(()) # empty tuple
28
>>> sys.getsizeof((1,))  # tuple with one element
32

From that you can see that each integer takes up 12 bytes, and the memory for each reference in a list or tuple is 4 bytes (on a 32-bit machine) plus the overhead (36 or 28 bytes respectively).

If your result has tuples of length 17 with integers, then you'd have 17*(12+4)+28 or 300 bytes per tuple. The result itself is a list, so 36 bytes plus 4 per reference. Find out how long the list will be (call it N) and you have 36+N*(4+300) as the total bytes required.

Edit: There's one other thing that could significantly affect that result. Python creates new integer objects as required for most integer values, but for small ones (empirically determined to be the range [-5, 256] on Python 2.6.4 on Windows) it pre-creates them all and re-uses them. If a large portion of your values are less than 257 this would significantly reduce the memory consumption. (On Python 257 is not 257+0 ;-)).

Peter Hansen
  • 21,046
  • 5
  • 50
  • 72
  • Note also a newly open-sourced package from Slide Inc: http://github.com/slideinc/meminfo which is a "C extension for finding precise in-memory sizes of python objects". – Peter Hansen Jun 17 '10 at 22:38
  • Note that `sys.getsizeof()` is not supported by PyPy. – amcnabb Nov 08 '12 at 17:44
4

Well first rather than writing:

all_rows = []
for row in rows:
    all_rows.append(row)

You can simply write:

all_rows = list(rows)

which will be quite a bit more efficient.

Then, there are two things to consider for memory consumption of a list:

  • memory consumption of the objects comprising the list; this obviously depends on these objects, their type, and whether there is a lot of sharing or not
  • memory consumption of the list itself; each object in the list is referenced by a pointer, which will take 4 bytes in 32-bit mode and 8 bytes in 64-bit mode; so, roughly, the size of the list itself is (4 or 8 bytes) times the number of objects in the list (that's ignoring the fixed list header overhead and the moderate amount of over-allocation that Python lists do)

By the way, in recent Python versions you can use sys.getsizeof() to get the size of an object:

>>> import sys
>>> sys.getsizeof([None] * 100)
872
Antoine P.
  • 4,181
  • 1
  • 24
  • 17
3

Addendum: Since you're dealing with lists of integers and worry about memory usage --- there is also the array-module:

[array] defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time [...].

Alex Brasetvik
  • 11,218
  • 2
  • 35
  • 36
1

You are asking for

http://en.wikipedia.org/wiki/Binomial_coefficient

http://www.brpreiss.com/books/opus7/programs/pgm14_10.txt

Anyhow, sounds like you are trying to solve an NP-complete problem by brute force ;)

Hamish Grubijan
  • 10,562
  • 23
  • 99
  • 147
  • It is an NP complete problem but mostly I am trying to learn a little and maybe stumble upon an answer. The question is stated here http://blog.computationalcomplexity.org/2009/11/17x17-challenge-worth-28900-this-is-not.html – Vincent Dec 31 '09 at 22:45
  • I would like the list of actual combination, I know how many there are. – Vincent Dec 31 '09 at 22:57