0

I am using a binomial tree as a list with 2^T - 1 'nodes' and want to create a set of subsets that work within some given criteria (outlined below) on the elements of the list. Right now, I use the following code to generate a tree

def gen_nodes(T):
    nodes = []
    for t in range(T):
        for i in range(2**t):
            nodes += [[t,1 + i]]
    return nodes

For instance, for T = 1, we get the root

gen_nodes(1) = [[0,1]],

but for T = 2 and T = 3, we get

gen_nodes(2) = [[0,1], [1,1], [1,2]]
gen_nodes(3) = [[0,1], [1,1], [1,2], [2,1], [2,2], [2,3], [2,4]],

et cetera. Right now, I'm using a powerset function courtesy of this wonderful contributor,

def powerset(s):
    x = len(s)
    masks = [1 << i for i in range(x)]
    for i in range(1 << x):
        yield [ss for mask, ss in zip(masks, s) if i & mask]

This has worked great for me, but as I'm sure you've caught at this point, the length of the powerset gets entirely too large with the time complexity of something like O(2^(2^T)). Initially, I was going to just create the entire set by brute force and then apply constraints on valid subsets after creating the larger set of subsets, but it seem's like I'm going to run into some overflow problems if I don't modify the powerset function with those constraints.

Basically, I only want the lists e within the output of ls = list(powerset(gen_nodes(T))) such that for all i from 0:len(e), e[i] in e implies [e[i] - 1, e[i]] OR [e[i] - 1, e[i] - 1] are in e.

Returning to the binary tree analog, this is basically saying for all [t,i] in [0,t] x [1,t+1], [t,i] in e only if [t-1,i] OR [t-1,i-1] in e, basically if [t,i] is in e, there there must be at least one "path" from [0,1] to [t,i] where each node in the path is also in e. I suspect this will condense the size of subsets output immensely, but I'm unsure of how to implement it. I think I might have to forgo using the powerset function, but I'm not sure how to code it in that case, and would therefore appreciate any help I can get.

EDIT: I should include desired output as commented. Additionally, I've included the function that has been 'working' for me thus far, but it's horribly inefficient currently. First, let pset be the function that solves this problem. and pg(i) = pset(gen_nodes(i)) for brevity. Then

pg(1) = [[0,1]],
pg(2) = [[[0, 1]], [[0, 1], [1, 1]], [[0, 1],
[1, 2]], [[0, 1], [1, 1], [1, 2]]]

Unfortunately, this set still grows very fast (pg(3) is 17 lists of length up to 6 pairs, pg(4) is 97 lists of length up to 10 pairs, etc), so I can't share much more on this post. However, I did develop a function that works, but seems to be horribly inefficient (pg(6) takes half a second, and then pg(7) takes 4 minutes to complete). It is attached below:

import time
def pset(lst):
    pw_set = [[]]
    start_time = time.time()
    for i in range(0,len(lst)):
        for j in range(0,len(pw_set)):
            ele = pw_set[j].copy()
            if lst[i] == [0,1]:
                ele = ele + [lst[i]]
                pw_set = pw_set + [ele]
            else:
                if [lst[i][0] - 1,lst[i][1]] in ele or [lst[i][0] - 1,lst[i][1] - 1] in ele:
                    ele = ele + [lst[i]]
                    pw_set = pw_set + [ele]
    print("--- %s seconds ---" % (time.time() - start_time))
    return pw_set[1:]

Here, I just checked if the 'node' being added had at least one of the nodes preceding it in the set: if not, it was skipped. I checked up to pg(3) and the output is as desired, so I'm thinking it's working, just inefficient. Thus, I've (seemingly) solved the memory overflow problem, now I just need to make this efficient.

jodawg
  • 1
  • 1

0 Answers0