Itertools combinations, ¿How to make it faster?

Question

I am coding this program that takes 54 (num1) numbers and puts them in a list. It then takes 16 (num2) of those numbers and forms a list that contains lists of 16 numbers chosen from all the combinations possible of "num1"c"num2". It then takes those lists and generates 4x4 arrays.

The code I have works, but running 54 numbers to get all the arrays I want will take a long time. I know this because I have tested the code using from 20 up to 40 numbers and timed it.

20 numbers =  0.000055 minutes
30 numbers =  0.045088 minutes
40 numbers =  17.46944 minutes

Using all the 20 points of test data I got, I built a math model to predict how long it would take to run the 54 numbers, and I am getting 1740 minutes = 29 hours. This is already an improvement from a v1 of this code that was predicting 38 hours and from a v0 that was actually crashing my machine.

I am reaching out to you to try and make this run even faster. The program is not even RAM intensive. I have 8GB of RAM and a core i7 processor, it does not slow down my machine at all. It actually runs very smooth compared to previous versions I had where my computer even crashed a few times.

Do you guys think there is a way? I am currently sampling to reduce processing time but I would prefer not to sample it at all, if that's possible. I am not even printing the arrays also to reduce processing time, I am just printing a counter to see how many combinations I generated.

This is the code:

import numpy as np
import itertools
from itertools import combinations
from itertools import islice
from random import sample

num1 = 30 #ideally this is 54, just using 30 now so you can test it.
num2 = 16
steps = 1454226 #represents 1% of "num1"c"num2" just to reduce processing time for testing.

nums=list()
for i in range(1,num1+1):
    nums.append(i)
#print ("nums: ", nums) #Just to ensure that I am indeed using numbers from 1 to num1

vun=list()
tabl=list()
counter = 0
combin = islice(itertools.combinations(nums, num2),0,None,steps)
for i in set(combin):
    vun.append(sample(i,num2)) 
    counter = counter + 1
    p1=i[0];p2=i[1];p3=i[2];p4=i[3];p5=i[4];p6=i[5];p7=i[6];p8=i[7];p9=i[8]
    p10=i[9];p11=i[10];p12=i[11];p13=i[12];p14=i[13];p15=i[14];p16=i[15]
    tun = np.array ([(p1,p2,p3,p4),(p5,p6,p7,p8),(p9,p10,p11,p12),(p13,p14,p15,p16)])
    tabl.append(tun)
    # print ("TABL:" ,tabl)
# print ("vun: ", vun)
print ("combinations:",counter)

The output I get with this code is:

combinations: 101

Ideally, this number would be 2.109492366(10)¹³ or at least 1%. As long as it runs the 54x16 and does not take 29 hours.

What goal are you trying to achieve by putting 54 numbers in a list and then taking 16 of those numbers, etc.? — mkrieger1, Sep 18 '21 at 15:55
For a fixed `k`, the value of `n choose k` grows exponentially with `n`. You can't speed this up. — chepner, Sep 18 '21 at 15:57
You say `steps = 1454226` "represents 10%", but 1/1454226 is far less than 10%... — no comment, Sep 18 '21 at 15:57
Stores an enormous amount of data in RAM. Says *"The program is not even RAM intensive."* — no comment, Sep 18 '21 at 16:14
@don'ttalkjustcode Yeah, looked too quickly at the definition and forgot the factorial mostly cancel each other. But given that `k` here is 16, the extremely high-degree polynomial is still impractical. — chepner, Sep 18 '21 at 16:28
21 trillion lists of length 16 involve over 250 trillion numbers. If you are trying to process that much data on an ordinary computer, doing it in just over a day isn't bad. — John Coleman, Sep 18 '21 at 16:29
As far as improving your code, creating 16 intermediate variables just to turn a tuple into an numpy matrix seems seriously inefficient. You can turn a list into an array and reshape the array into a 4x4 matrix in 2 lines of code with no intermediate variables. — John Coleman, Sep 18 '21 at 16:38
@JohnColeman Yeah, tried on TIO, which is already fairly fast, there even just running through the combinations would take [~3 days](https://tio.run/##TY7BDoIwDIbvfYre2MyMKEq8@AQmvMPEKYusI1sP8vS4QTBc1v3t/7X/MHLnqboOYZpewTu0bAJ730e0bvCBsfXuYUmz9RQV2tjb1sBsdZq7rWvpsnVm7eY/QKPwjje8nBUeayCFn6SqclaQwZj09owImt5GkExWCVymcd4kJJD5slgyiJlUMyjy0uRuPJm1AP8x3COXMARLLBh3C5JTSTxs@CTqcn1OKW3x1GMs5DT9AA). — no comment, Sep 18 '21 at 16:41
Still wondering the purpose of the code. I'm afraid to be disappointed of learning the real use case... — mozway, Sep 18 '21 at 16:49
I am such a noob here, I don't even know how to answer each one of you individually. The purpose of the code is just get 4x4 arrays with different numbers from 1 to 54. Think of it as if it was a bingo card, but it is 4x4 and only with numbers that can go from 1 to 54. — CeDe, Sep 18 '21 at 17:11
Yes, the processing time grows exponentially, that is how I am predicting 29 hours for 54c16 based on the data points I already tested. — CeDe, Sep 18 '21 at 17:12
It is not really storing the data I think, if you run the code and monitor your RAM, you will see it doesn't really increase as the code is running. — CeDe, Sep 18 '21 at 17:13
This is one of my first codes in history...haha. I will try to figure out how to do the reshape thing John Coleman mentioned... — CeDe, Sep 18 '21 at 17:14
@CeDe The code as posted doesn't take much RAM because you only store 101 combinations. But you say you want to do this with trillions. Probably that RAM usage is what "crashed" your machine. — no comment, Sep 18 '21 at 17:16
btw, eventually I will print the arrays, because I will use them to run an elimination process on them and only keep a few. Even printing the arrays does not increase that much the processing time. The main issue I have is with the running of the combinations, that is what is taking the most time, and that is what I need to improve. I would really appreciate your help. I will definitely look up for the array reshape that was mentioned. — CeDe, Sep 18 '21 at 17:18
@don'ttalkjustcode No, even if you change num1 to 54 and steps to 210949236593. The machine doesn't crash. My machine is not crashing with this code. It is running but will take more than 1 day to solve it and give me results. I am trying to reduce the run time to less than 29 hours. I am very new at coding and don't know all the tools available yet (will do more research on itertools and numpy). I am sure there must be a more efficient or at least much faster way of doing what I am doing. — CeDe, Sep 18 '21 at 17:37
You keep using large steps values.... I do not think it means what you think it means. — no comment, Sep 18 '21 at 17:40
@don'ttalkjustcode I might not. I think I do. So, it is a larger step value and will also generate 101 combinations at the end. But, it will run the full 54c16 to get to those final 101. I agree, reducing the step value will increase my processing time. But running the code, with 1 for step value or a larger one, does not really affect the processing time. It is num1 and num2 what really affects it and I am sure it is because of how the code is written. — CeDe, Sep 18 '21 at 17:52

Matthew Strawbridge · Accepted Answer · 2021-09-19T08:45:30.497

The main inefficiency comes from generating all the combinations (itertools.combinations(nums, num2)), only to throw most of them away.

Another approach would be to generate combinations at random, ensuring there are no duplicates.

import itertools
import random

def random_combination(iterable, r):
    "Random selection from itertools.combinations(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.sample(range(n), r))
    return tuple(pool[i] for i in indices)
    
items = list(range(1, 55))

samples = set()

while len(samples) < 100_000:
    sample = random_combination(items, 16)
    samples.add(sample)

for sample in samples:
    board = list(sample)
    random.shuffle(board)
    board = [board[0:4], board[4: 8], board[8: 12], board[12: 16]]

print("done")

This uses the random_combination function from answers to this question, which in turn comes from the itertools documentation.

The code generates 100,000 unique 4x4 samples in about ten seconds, at least on my machine.

A few notes:

Each sample is a tuple and the entries are sorted; this means we can store them in a set and avoid duplicates.
Because of the first point, we shuffle each sample before creating a 4x4 board from it; the later code doesn't do anything with these boards, but I wanted to include them to get a sense of the timing.
It's possible that there would be lots of hash collisions if you were to sample a large proportion of the space, but that's not feasible anyway because of the amount of data that would involve (see below).

I think there's been some confusion about what you are trying to achieve here.

54C16 = 2.1 x 10^13 ... to store 16 8-bit integers for all of these points would take 2.7 x 10^15 bits, which is 337.5 terabytes. That's beyond what could be stored on a local disk.

So, to cover even 1% of the space would take over 3TB ... maybe possible to store on disk at a push. You hinted in the question that you'd like to cover this proportion of the space. Clearly that's not going to happen in 8GB of RAM.

After writing this I realised that it's really just an expansion of the second part of @pi-marillion's answer, though. — Matthew Strawbridge, Sep 18 '21 at 22:06
Sir. This is definitely the right answer. I really appreciate your help. It is amazing how you achieved it. Or maybe not that amazing to other experts, but being a total noob at coding, this looks amazing to me. BRAVO! I am also glad as to what I achieved and learned. I was going to try something similar to what you wrote, inspired by what @pi-marillion wrote as well. Of course I was never going to write it as well as you did. Thanks! — CeDe, Sep 19 '21 at 03:16

Pi Marillion · Answer 2 · 2021-09-18T18:09:50.963

Just calculating the number of combinations is trivial, since it's just a formula:

import math

math.comb(30, 16)
# 145422675

math.comb(54, 16)
# 21094923659355

The trouble is that storing the results of the 16 of 30 case requires about 64 GB of RAM on my machine. You might but probably don't have that much RAM just sitting around like I do. The 16 of 54 case requires about 9.3 PB of RAM, which no modern architecture supports.

You're going to need to take one of two approaches:

Limit to the 16 in 30 case, and don't store any results into vun or tabl.

Pros: Can be made to work in < 5 minutes in my testing.

Cons: Doesn't work for the 16 in 54 case at all, no additional processing is practical
Do a Monte Carlo simulation instead: generate randomized combinations up to some large but reachable sample count and do your math on those.

Pros: Fast and supports both 16 of 30, and 16 of 54 potentially with the same time performance

Cons: Results will have some random variation depending on random seed, should be statistically treated to get confidence intervals for validity.

Note: The formulas used for confidence intervals depend on which actual math you intend to do with these numbers, the average is a good place to start though if you're only looking for an estimate.

I strongly suggest option (2), the Monte Carlo simulation.

@don'ttalkjustcode Thanks for the suggestion. I've been on RHEL, which has Python 3.6 instead of 3.8, so no `math.comb` on it. My home box has 3.8 though, so `math.comb` FTW — Pi Marillion, Sep 18 '21 at 18:12
I appreciate your response. I think there is a misunderstanding in the whole thing. My machine is not crashing, it is not trying to use 64GB of RAM, not even 9.3PB of RAM (this was fun to see). At the moment it is running the num1=54 case, and if my prediciton is right, it should end tomorrow, giving me the same 101 combinations for the step size I used. Running this code with num1=30 and steps of 1% of 30c16 takes .045 minutes (27 seconds). But running it with num1=54 and steps of 1% of 54c16 will take way longer WITHOUT CRASHING MY MACHINE or yours or anyone who tries this. — CeDe, Sep 18 '21 at 18:18

Itertools combinations, ¿How to make it faster?

2 Answers2