0

I am appending similarity scores of all pairs in a list.

data = []

for i1, i2 in list: 
    data.append([i1, i2, cosine_similarity([X[df.index.get_loc(i1)]],[X[df.index.get_loc(i2)]]).ravel()[0]])

However, I need it to only append scores that are non-zero.

I put in an if statement, but it produces an error since it is not of int type.

for i1, i2 in list:
    if [cosine_similarity([X[df.index.get_loc(i1)]], [X[df.index.get_loc(i2)]])] > 0:
        data.append([i1, i2, cosine_similarity([X[df.index.get_loc(i1)]], [X[df.index.get_loc(i2)]]).ravel()[0]])

Any way of only appending only none-zeros as part of the iteration?

jpp
  • 159,742
  • 34
  • 281
  • 339
user6453877
  • 314
  • 1
  • 4
  • 14
  • What does "produces an error" mean? Do you get an exception? If so, show us the whole exception. – abarnert Apr 01 '18 at 20:48
  • 1
    If it's [this error](https://stackoverflow.com/questions/34472814/use-a-any-or-a-all), you'll also need to explain more of what you're trying to do, because the answer _might_ be exactly what's in the error message, but it might be something else like using a mask, and there's no way we can know which one you want without sample input and desired output and why you want that output. – abarnert Apr 01 '18 at 20:50
  • I don't see anything called "score". You have something called `df` which is a ... what? Are i1 and i2 indicies? Are you wanting to skip ones that are zero? How about a running example? And how about trimming it down to just what's useful for the question. Does `cosine_similarity` make any difference to the problem? – tdelaney Apr 01 '18 at 20:57
  • `[cosine_similarity([X[df.index.get_loc(i1)]], [X[df.index.get_loc(i2)]])] > 0` should likely not have the result of the call wrapped in a list `cosine_similarity([X[df.index.get_loc(i1)]], [X[df.index.get_loc(i2)]]) > 0`. – Dan D. Apr 01 '18 at 21:02
  • @DanD. That was it! I copied it from the for-loop to the if-statement. Thanks! – user6453877 Apr 01 '18 at 21:28

1 Answers1

0

The general pattern for a conditional iteration is (a for a in b if a). Pulling your calculation into a helper function for readability, this should work:

def calc_sim(X, df, i1, i2):
    return cosine_similarity([X[df.index.get_loc(i1)]], 
        [X[df.index.get_loc(i2)]])

data = [(i1, i2, sim) 
    for (i1, i2, sim) in ((i1, i2, calc_sim(X, df, i1, i2)) 
    if sim > 0]
tdelaney
  • 73,364
  • 6
  • 83
  • 116