-1

I'm building a similarity matrix of a list of items. The naive approach is to iterate the list twice, but this needlessly will compare A:B and B:A when they're the same.

for A in items:
   for B in items:
      if A==B: continue
      sim[A][B] = calc_sim(A, B)

is there a simple way to only calculate half of the values? I could put a skip in there like

if sim[B][A]: continue # already calculated in other direction

But still the iteration is happening. Effectively I just want to iterate through the top or bottom half of the grid:

sim matrix

There are some similar Qs, but nothing with a canonical answer. This seems like a basic CS algo question!

dcsan
  • 11,333
  • 15
  • 77
  • 118

3 Answers3

3

You could use itertools.combinations.

import itertools

for a, b in itertools.combinations(items, 2):
    sim[a][b] = sim[b][a] = calc_sim(a, b)
abc
  • 11,579
  • 2
  • 26
  • 51
  • oh thanks for pointing this out. it's a bit python specific, but itertools is a great toolbox. – dcsan Jan 02 '21 at 22:57
0

Assuming calc_sim(A, B) == calc_sim(B, A), you could try this:

for A in range(0, len(items)):
   for B in range(A, len(items)): # Replace with A+1 if you don't want the case A == B
      # Remember A and B are indexes, so change code accordingly
      result = calc_sim(items[A], items[B])
      sim[A][B] = result # Copy result to both A,B and B,A as they are equal
      sim[B][A] = result

However actually both algorithms are O(n) n2

mandulaj
  • 733
  • 3
  • 10
0

If you need just a general algorithm to reduce number iterations, you can limit the range of the inner loop

for i, A in enumerate(items):
   for B in items[:i]:
      sim[A][B] = calc_sim(A, B)

But if you are looking for Python-specific optimization, it would be much better to use numpy vectorization. For example, if calc_sim(a, b) computes squared difference between a and b, then it can be vectorized the following way:

import numpy as np

list = [1, 2, 3]
array = np.array(list)
sim = np.square(array[:,np.newaxis] - array)
[[0 1 4]
 [1 0 1]
 [4 1 0]]
fdermishin
  • 3,519
  • 3
  • 24
  • 45
  • it's actually an array of word vectors I'm using to do the comparison. Can `np` be made to take any comparison function and apply it to a grid in that way, or just np built-ins like `np.square` ? – dcsan Jan 02 '21 at 22:59
  • @dcsan It can be done if the vectors have the same length. I think that this question may help: https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array But if the vectors have different lengths, than numpy is unlikely to be able to handle them, as long as they are not preprocessed in some way – fdermishin Jan 02 '21 at 23:06