I need to find all the different intersections between two partitions of the same set. For example, if we have the following two partitions of the same set
x = [[1, 2], [3, 4, 5], [6, 7, 8, 9, 10]]
y = [[1, 3, 6, 7], [2, 4, 5, 8, 9, 10]]
the required result is
[[1], [2], [3], [4, 5], [6, 7], [8, 9, 10]].
In detail, we calculate the cartesian product between every subset of x and y, and for each of these products, we classify the elements in new subsets accordingly if they belong to the intersection of their associated subsets or not.
What is the optimal / more pythonic way to do it? Thanks in advance!
PERFORMANCE COMPARISON OF THE CURRENT ANSWERS:
import numpy as np
def partitioning(alist, indices):
return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]
total = 1000
sample1 = np.sort(np.random.choice(total, int(total/10), replace=False))
sample2 = np.sort(np.random.choice(total, int(total/2), replace=False))
a = partitioning(np.arange(total), list(sample1))
b = partitioning(np.arange(total), list(sample2))
def partition_decomposition_product_1(x, y):
out = []
for sublist1 in x:
d = {}
for val in sublist1:
for i, sublist2 in enumerate(y):
if val in sublist2:
d.setdefault(i, []).append(val)
out.extend(d.values())
return out
def partition_decomposition_product_2(x, y):
all_s = []
for sx in x:
for sy in y:
ss = list(filter(lambda x:x in sx, sy))
if ss:
all_s.append(ss)
return all_s
def partition_decomposition_product_3(x, y):
return [np.intersect1d(i,j) for i in x for j in y]
And measuring execution time with %timeit
%timeit partition_decomposition_product_1(a, b)
%timeit partition_decomposition_product_2(a, b)
%timeit partition_decomposition_product_3(a, b)
we find
2.16 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
620 ms ± 84.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.03 s ± 111 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
thus the second solution is the fastest one.