In the following problem I have a nested list wp
. The N lists in wp
contain integer numbers. I want to compute as fast as possible the average number of pairwise different elements between the lists, e.g.
N = 3
wp[0] = [0,1,2]
wp[1] = [0,1]
wp[2] = [3]
--> Different elements (wp[0],wp[1])=1
--> Different elements (wp[0],wp[2])=4
--> Different elements (wp[1],wp[2])=3
---> Avg. pairwise different elements = (1+4+3)/3=2.666
Currently my code is the following:
avg_distance = 0.0
for x1 in range(N):
for x2 in range(x1 + 1, N):
distance_temp = 1.0 * len(list(set(wp[x1]) ^ set(wp[x2])))
avg_distance += distance_temp
avg_distance = 1.0 * avg_distance / (1.0 * N * (N - 1.0) / 2.0)
This is by far the most significant bottleneck in my code. I'm wondering if it could be done faster? I actually use this part within a function that has a numba decorator. Thanks!