1

I need to generate every combination of 6 numbers from a set of 55. I believe there are 28,989,675 indexes in that set of combinations. I guess I'm running out of memory, because I can generate combinations with 4 numbers but nothing larger than that. How can I fix this problem?

I'm using a modification of some code I borrowed from a tutorial here: https://www.youtube.com/watch?v=VyXDQxuIwPU

import itertools

text_file = open("comb3.txt", "w")

harmonics = [28, 33, 36, 38, 40, 43, 45, 47, 48, 50, 52, 55, 55.86, 57, 59, 60, 60.86, 61.69, 62, 63.86, 64, 65.86, 66, 66.69, 67, 69, 69.69, 70.86, 71, 71.69, 72, 74, 75.86, 76, 76.69, 77.86, 79, 81, 81.69, 82.86, 83.69, 84, 84.86, 86, 88, 88.69, 89.86, 90.69, 91, 93, 95, 95.69, 96.86, 98, 100]

combos = itertools.combinations(harmonics, 4)

usable_combos = []
for e in combos:
    usable_combos.append(e)

print usable_combos

s = str(usable_combos)

text_file.write(s)
text_file.close()

Thanks!

  • 1
    I wonder why you need `combos` and `usable_combos`. Aren't they the same? – Havenard Aug 18 '14 at 04:49
  • 1
    Even if you get your memory use down by printing as you go (instead of accumulating a list), it will take forever to print (i/o is the slowest part of the process). Surely, you don't really want 29 million lines printed on your screen. What do you actually want to do the the output (that is where the interesting part takes place). – Raymond Hettinger Aug 18 '14 at 06:29
  • I want to create an array from the resulting combos. I need to sort the results based on simple arithmetic relationships between the members of the set at each index. The numbers I'm using represent MIDI note numbers (music notes), and I need to find all sets with a certain "interval class vector" for instance (http://en.wikipedia.org/wiki/Interval_class_vector) The brute-force method seems difficult but not impossible, and I feel that there would be advantages in taking that approach. Also I'm pretty new to programing and to Python, so this seems easiest. What do you think? Doable? –  Aug 18 '14 at 07:08
  • Any plan that involves sorting 29 million things using a computed sort key in Python deserves at least one second thought. If ICV is computed for each 6-tuple individually and you are looking for a specific value, that sounds more like filtering which is more feasible and which you should do on the initial combinations before any output. Not familiar with ICV though. – Jason S Aug 18 '14 at 07:40
  • Is the problem partly that Python is the wrong tool for this project? Would a different language make more sense if I continue with this method? –  Aug 18 '14 at 17:35

2 Answers2

4

Iterators like itertools.combinations only generate a piece of data at a time which is relatively memory efficient. But when you put all of the values into a list you need memory to store all of them at once (btw, usable_combos = list(combos) would replace your for loop, not that you should do that).

Since you are writing them to a file, you can write each combo to the file as you get it from the iterator, not create a list. Now, do you need it to be formatted like the repr of a Python list? Because if not, this would make more sense:

for combo in combos:
    text_file.write(str(combo) + "\n")

Note: changed from using "{}\n".format(combo) due to profiling.

If you want it like the repr of the list, you'll need to write the [ and ] yourself, and commas instead of newlines.

-more-

Based on the updates in the comments - if you're looking for specific combinations, the best place to look for them is probably before writing them to the file, since otherwise you just have to load them back from the file and look at them all again. If you will be selecting a small fraction of the available combinations based on some criteria, selecting them up front will cut down your work later.

In general, you could also look into Cython for some more speed without having to learn actual C, and if you really want to brute-force something with memory requirements beyond your own computer's, appropriately sized EC2 instances are in the vicinity of 20 cents an hour.

Jason S
  • 13,538
  • 2
  • 37
  • 42
  • Thanks Jason. So will this method still work even given the very large size of the resulting file? I implemented this but haven't had any luck actually writing that file. (Lines 1-5 unchanged.) for combo in combos: text_file.write(str(combo) + "\n") text_file.close() –  Aug 18 '14 at 06:58
  • Well, it will still take a very long time – Jason S Aug 18 '14 at 07:01
  • Are we talking hours or days? :) –  Aug 18 '14 at 07:17
3

One reason why you're running out of memory is due to the fact that (as you quite rightly stated): 55 choose 6 = 28,989,675

Now, think about how much memory that is, exactly. We can perform a quick back-of-the-envelope calculation to estimate how much memory this will take:

Since your list is using floats and integers, we can deduce the upper bound for the memory consumption as being:

sys.getsizeof(float())

Which on a 64 bit machine is 24 bytes, and on a 32 bit machine is 16 bytes

And, since tuples take up: 56 + 8 * len(t) bytes (64 bit)

Hence, the upper bound of your calculation would take:

  1. 28,989,675 * 6 * 24 + 28,989,675 * (56 + 8 * 6) bytes ~ 6,856.39 MiB of memory (64 bit)
  2. 28,989,675 * 6 * 16 + 28,989,675 * (56 + 8 * 6) bytes ~ 5,529.34 MiB of memory (32 bit)

Recalling that Python lists are implemented contiguously (for O(1) lookup time), this is the probable reason why it crashes, as you also have to consider the memory occupied by the OS and other programs in RAM.

Compare this to the other example you cited: 55 choose 4 = 341,055 => ~ 59.85 MiB (64 bit) or ~49.44 MiB (32 bit) of contiguous memory. Since this is a very reasonable amount of memory that can be contiguous, it runs without problem.

EDIT

Original link (dead): http://deeplearning.net/software/theano/tutorial/python-memory-management.html

jrd1
  • 10,358
  • 4
  • 34
  • 51