I have the following two sentences:
- I want to go home.
- I would like to leave.
My goal is to quantify similarity between the two sentences using a kernel suggested in this paper. I extract all the dependency triplets for each sentence. These are 3 item tuples containing all the relations between words in the sentence and look like (tail, relationship, head).
To calculate similarity, I need to loop through every possible combination of triplet across sentences and add a particular number to the similarity score based on how many nodes match and whether the relationship matches.
I attempted using list comprehensions inside a for loop since I figured it would be more efficient than another nested for loop but am getting a syntax error. Here's my code:
sim = 0
theta = 2.5
for d1 in deps1:
[sim += theta for d2 in deps2 if ((d1[0]==d2[0] or d1[2]==d2[2]) and d1[1]==d2[1])]
[sim += 1 for d2 in deps2 if ((d1[0]==d2[0] or d1[2]==d2[2]) and d1[1]!=d2[1])]
For reference, here's what deps1 and deps2 look like when printed:
[('I', 'nsubj', 'want'), ('want', 'ROOT', 'want'), ('to', 'aux', 'go'), ('go', 'xcomp', 'want'), ('home', 'advmod', 'go')]
[('I', 'nsubj', 'like'), ('would', 'aux', 'like'), ('like', 'ROOT', 'like'), ('to', 'aux', 'leave'), ('leave', 'xcomp', 'like')]
Questions:
- What's the correct syntax to do this with a list comprehension?
- Is there a more efficient way, maybe using numpy(?), to do this computation?