0

I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set

x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]

the required result is

[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].

In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.

What is the optimal / more pythonic way to do it? Thanks in advance!


PERFORMANCE COMPARISON OF THE CURRENT ANSWERS:

import numpy as np

def partitioning(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))

a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))

def partition_decomposition_product_1(x, y):
    out = []
    for sublist1 in x:
        d = {}
        for val in sublist1:
            for i, sublist2 in enumerate(y):
                if val in sublist2:
                    d.setdefault(i, []).append(val)
        out.extend(d.values())
    return out

def partition_decomposition_product_2(x, y):
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

def partition_decomposition_product_3(x, y):
    return [np.intersect1d(i,j) for i in x for j in y]

And measuring execution time with %timeit

%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)

we find

2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

thus the second solution is the fastest one.

Jorge
  • 83
  • 10
  • I wonder if working with arrays helps at all! You have lists of different length, and expect results that also differ. Those `numpy` `set` functions depend heavily on sorting, where as python's own `set` class uses hashing (as done with `dict` keys). Speed/efficiency in `numpy` is the result of doing whole-array calculations with compiled methods. That doesn't seem to be an option here. – hpaulj May 06 '20 at 23:42
  • Thanks for the comment, @hpaulj. It happens that as is hinted in the decomposition function, the rows come directly from numpy arrays, that's why I used those numpy sets implementation. – Jorge May 06 '20 at 23:58

3 Answers3

2

I'm not sure if I understand you correctly, but this script produces the result you have in your question:

x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]

out = []
for sublist1 in x:
    d = {}
    for val in sublist1:
        for i, sublist2 in enumerate(y):
            if val in sublist2:
                d.setdefault(i, []).append(val)
    out.extend(d.values())

print(out)

Prints:

[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    Hi Andrej. It reproduces the result, indeed :) It is WAY simpler than mine hahaha. I just need to check your suggestion in my general case, and I will come back to accept the answer if it works :) Thanks a lot! – Jorge May 06 '20 at 23:46
2

I may miss some details, but it seems a bit too easy:

[np.intersect1d(a,b) for a in x for b in y]

Output:

[array([1]),
 array([2]),
 array([3]),
 array([4, 5]),
 array([6, 7]),
 array([ 8,  9, 10])]

The above includes duplicates, for example x=[[1,2,3],[1,4,5]] and y=[[1,6,7]] would gives [[1],[1]].


If you want to find the unique intersections:

[list(i) for i in {tuple(np.intersect1d(a,b)) for a in x for b in y}]

Output:

[[8, 9, 10], [6, 7], [1], [4, 5], [2], [3]]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • OMG this is embarrassing to me hahaha, only one line! I think @tiberius' solution is a bit faster though. Thanks, Quang. – Jorge May 07 '20 at 14:35
1

The fact that the two lists are partitions of the same set is not relevant to the algorithm choice. This boils down to iterating through two lists of lists and getting the intersection between each combination (you can add that assertion at the beginning of the function to ensure they are partitions of the same set, using this answer to flatten the lists efficiently). With this in mind, this function accomplishes the task, using this answer to calculate list intersection:

def func2(x, y):
    # check that they partition the same set 
    checkx = sorted([item for sublist in x for item in sublist])
    checky = sorted([item for sublist in y for item in sublist])
    assert checkx == checky

    # get all intersections
    all_s = []
    for sx in x:
        for sy in y:
            ss = list(filter(lambda x:x in sx, sy))
            if ss:
                all_s.append(ss)
    return all_s

Then using this time comparison method, we can see that this new function is ~100x faster than your original implementation.

Jorge
  • 83
  • 10
tiberius
  • 430
  • 2
  • 14
  • Wow! that one line defining ss it's quite impressive, it makes all the work :) I will check your answer carefully and time compare it with the other ones. Thanks a lot, @tiberius – Jorge May 07 '20 at 00:20
  • I suggested an edition to your answer where I included the existence condition for the ss list to be appended to the final list all_s – Jorge May 07 '20 at 00:50
  • 1
    I also edited checky definition including y, and not x – Jorge May 07 '20 at 00:59