4

Given a 2d matrix such as [[a,b,c],[d,e,f]...]], I want to perform a Cartesian product of the matrix so I can determine all the possible combinations.

For this particular constraint, when I am using a 2d matrix with 12 different subsets, it uses more than the 16 megabytes of allotted memory I have. There are three values in each subset, so I would have 312 different combinations.

The cartesian product function that I am using is:

def cartesian_iterative(pools):
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    return result

I would like to know how I could reduce memory consumption without using any external libraries. An example 2d array I would working with is [['G', 'H', 'I'], ['M', 'N', 'O'], ['D', 'E', 'F'], ['D', 'E', 'F'], ['P', 'R', 'S'], ['D', 'E', 'F'], ['M', 'N', 'O'], ['D', 'E', 'F'], ['D', 'E', 'F'], ['M', 'N', 'O'], ['A', 'B', 'C'], ['D', 'E', 'F']]

EDIT: For reference, a link to the problem statement can be found here Problem Statement. Here is the link to the file of possible names Acceptable Names.

The final code:

with open('namenum.in','r') as fin:
    num = str(fin.readline().strip()) #the number being used to determine all combinations

numCount = []
for i in range(len(num)):
    numCount.append(dicti[num[i]]) #creates a 2d array where each number in the initial 'num' has a group of three letters


def cartesian_iterative(pools): #returns the product of a 2d array
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    return result

pos = set() #set of possible names
if len(num) == 12: #only uses more than the allocated memory when the num is 12 digits long.
    '''
    This optimization allows the product to only calculate 2 * 3^6 values, instead of 3**12. This saves a lot of memory
    '''
    rights = cartesian_iterative(numCount[6:])
    for left in cartesian_iterative(numCount[:6]):
        for right in rights:
            a = ''.join(left+right)
            if a in names:
                pos.add(a) #adding name to set
else: #if len(num) < 12, you do not need any other optimizations and can just return normal product 
    for i in cartesian_iterative(numCount):
        a = ''.join(i)
        if a in names:
            pos.add(a)
pos = sorted(pos)


with open('namenum.out','w') as fout: #outputting all possible names
    if len(pos) > 0:
        for i in pos:
            fout.write(i)
            fout.write('\n')
    else:
        fout.write('NONE\n')
Krish
  • 415
  • 2
  • 11
  • 3
    Seems like `itertools.product` is the best shot at solving your problems. It returns a generator, so it's memory-friendly. Are you aware of it/have you tried it? – ggorlen Mar 14 '20 at 16:30
  • I am aware of the module/function, but I was hoping for an optimization that I could use without using a library. – Krish Mar 14 '20 at 16:31
  • 4
    `itertools` is a builtin library. It is verified, fast, and well-known(readable). Why don't you want to use it? – Boseong Choi Mar 14 '20 at 16:31
  • Yes, I think the answer to this question is "use `itertools.product` and if you don't want to use it for some reason, copy its C source code and write an extension". – ggorlen Mar 14 '20 at 16:36
  • @ggorlen I guess `itertools.product` is where that code [came from](https://docs.python.org/3/library/itertools.html#itertools.product)... – Kelly Bundy Mar 14 '20 at 16:49
  • Yes, but for it to behave anything like `itertools.product`, it'd need to be implemented in C and return a generator. The provided code in the docs is just to communicate to Python folks the flavor of the algorithm, but [this is the actual implementation](https://github.com/python/cpython/blob/master/Modules/itertoolsmodule.c#L2091). – ggorlen Mar 14 '20 at 16:56
  • @ggorlen Why would it need to be implemented in C? What aspect of it can't be done in a Python generator? – Kelly Bundy Mar 14 '20 at 17:12
  • It can be, logically, but because Python is slow, there's simply no contest or comparison to be made. – ggorlen Mar 14 '20 at 17:24
  • Does this answer your question? [How can I compute a Cartesian product iteratively?](https://stackoverflow.com/questions/2419370/how-can-i-compute-a-cartesian-product-iteratively) – norok2 Mar 14 '20 at 18:09

1 Answers1

3

You could use that function on left and right half separately. Then you'd only have 2×36 combinations instead of 312. And they're half as long, somewhat even canceling that factor 2.

for left in cartesian_iterative(pools[:6]):
    for right in cartesian_iterative(pools[6:]):
        print(left + right)

Output:

['G', 'M', 'D', 'D', 'P', 'D', 'M', 'D', 'D', 'M', 'A', 'D']
['G', 'M', 'D', 'D', 'P', 'D', 'M', 'D', 'D', 'M', 'A', 'E']
['G', 'M', 'D', 'D', 'P', 'D', 'M', 'D', 'D', 'M', 'A', 'F']
['G', 'M', 'D', 'D', 'P', 'D', 'M', 'D', 'D', 'M', 'B', 'D']
...

To be faster, compute the right combinations only once:

rights = cartesian_iterative(pools[6:])
for left in cartesian_iterative(pools[:6]):
    for right in rights:
        print(left + right)
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • Interesting solution, but it still uses more than the allocated 16mb of memory. I tested this solution using the USACO grader. – Krish Mar 14 '20 at 17:16
  • @Krish This should take at most about 140 kb. What's the link to the USACO problem? – Kelly Bundy Mar 14 '20 at 17:20
  • So the 3 and 12 are indeed limits. Then I can only guess you *collected* the combinations again instead of processing them. – Kelly Bundy Mar 14 '20 at 17:28
  • The test case used was 463373633623. The link to the dictionary is https://pastebin.com/rTZAxwAN – Krish Mar 14 '20 at 17:28
  • I did collect all the combinations and then I checked whether it was in the dictionary set. Maybe if i just immediately check instead of adding it to a container it will reduce memory consumption. – Krish Mar 14 '20 at 17:30
  • After implementing your algorithm, and checking during each product, the program passes the USACO grader without memory problems. Bisecting the two arrays into arrays of 6 values each + checking each value when it is calculated (no adding to a container) significantly reduces the memory consumption for all of the test cases. – Krish Mar 14 '20 at 17:39
  • @ggorlen While I agree that their approach is far from ideal for their actual problem, I wouldn't encourage radically changing perfectly valid and answered questions. See https://meta.stackexchange.com/q/286803/691385 – Kelly Bundy Mar 14 '20 at 17:48
  • Definitely. Now that they've accepted an answer, it's too late. For future notice to OP, please provide context to the problem you're really trying to solve rather than questions about the attempted solution without context. – ggorlen Mar 14 '20 at 18:02