2

I have the following two sentences:

  1. I want to go home.
  2. I would like to leave.

My goal is to quantify similarity between the two sentences using a kernel suggested in this paper. I extract all the dependency triplets for each sentence. These are 3 item tuples containing all the relations between words in the sentence and look like (tail, relationship, head).

To calculate similarity, I need to loop through every possible combination of triplet across sentences and add a particular number to the similarity score based on how many nodes match and whether the relationship matches.

I attempted using list comprehensions inside a for loop since I figured it would be more efficient than another nested for loop but am getting a syntax error. Here's my code:

sim = 0
theta = 2.5

for d1 in deps1:
    [sim += theta for d2 in deps2 if ((d1[0]==d2[0] or d1[2]==d2[2]) and d1[1]==d2[1])]
    [sim += 1 for d2 in deps2 if ((d1[0]==d2[0] or d1[2]==d2[2]) and d1[1]!=d2[1])]

For reference, here's what deps1 and deps2 look like when printed:

[('I', 'nsubj', 'want'), ('want', 'ROOT', 'want'), ('to', 'aux', 'go'), ('go', 'xcomp', 'want'), ('home', 'advmod', 'go')]
[('I', 'nsubj', 'like'), ('would', 'aux', 'like'), ('like', 'ROOT', 'like'), ('to', 'aux', 'leave'), ('leave', 'xcomp', 'like')]

Questions:

  1. What's the correct syntax to do this with a list comprehension?
  2. Is there a more efficient way, maybe using numpy(?), to do this computation?
Mazdak
  • 105,000
  • 18
  • 159
  • 188

2 Answers2

1

What you seem to want to achieve is a cumulative result but you can't do it in that way because the expression sim += theta is not returning an independent object to be considered as an item of final list result. What you can do instead is multiplying the theta variable with a counter or create a list of thetas and then create a cumulative version using np.cumsum() or itertools.accumulate() which is not recommended unless you want to keep both original result and the cumulative one.

Also, instead of using two loops you can use itertools.product in order to create all the combinations of triplets and as counter you can use itertools.count.

In [36]: from itertools import product, count

In [37]: c = count(1)

In [38]: [2.5*next(c) for d1, d2 in product(deps1,deps2) if ((d1[0]==d2[0] or d1[2]==d2[2]) and d1[1]==d2[1])]
Out[38]: [2.5, 5.0]

And to do both conditions in one list comprehension you can do the following:

[(d1[1]!=d2[1] or 2.5)*next(c) for d1, d2 in product(deps1,deps2) if d1[0]==d2[0] or d1[2]==d2[2]]
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Works for matching relationship between tuples! What's the best way to extend it to compute ((# of matching nodes) * (1 if the relationship doesn't match, 2.5 if it does))? The nodes are the first and third elements in the tuples, the relationship between them is the second. – Prratek Ramchandani Jun 12 '18 at 10:52
1

In Python you can use expressions in a list comprehension but not statements. You may want to look at the diffence between expressions and statements in Python.

As of your question in the comment on how to compute ((# of matching nodes) * (1 if the relationship doesn't match, 2.5 if it does)), which is the numerator of the SABK similarity function of the paper in your question, you can do it using a generator and the sum function:

theta = 2.5
sim = sum((((d1[0] == d2[0]) + (d1[2] == d2[2])) * (theta if d1[1] == d2[1] else 1) for d1, d2 in product(deps1, deps2)))

Or, if you want to separate the code for the function of the similarity per sentence, which improves the readability of the code:

def sim_per_sentence(d1, d2):
    matching_nodes = (d1[0] == d2[0]) + (d2[0] == d2[0])
    relation_sim = theta if d1[1] == d2[1] else 1
    return matching_nodes * relation_sim

sim = sum((sim_per_sentence(d1, d2) for d1, d2 in product(deps1, deps2)))

Remark that it may be way more efficient to use a generator expression instead of a list comprehension if you have many elements in deps1 and deps2, as the individual results of each iteration do not need to be stored in memory.

Jundiaius
  • 6,214
  • 3
  • 30
  • 43