3

I was going through the predict method for the entity linker pipe under spacy, and for some reason the score is defined as the following :

scores = prior_probs + sims - (prior_probs*sims)

Link here

Anybody has experience with this / knows where this formula comes from?

Thanks!

user3741951
  • 189
  • 1
  • 11

1 Answers1

3

It is taken from Entity Linking via Joint Encoding of Types, Descriptions, and Context section 4 equation 2.

I don't feel confident enough though in explaining the formula in detail, on overall the purpose is to combine probability scores for entitiy candidates derived from external knowledge based resources (KB in the paper), which are the prior probabilities, and scores estimated with a sentence encoder, used to encode the mention to link along with its context, sims in the formula because they compute cosine similarity between the encoded mention vector and all entity candidates (which is why this formula is used only if "incl_context" is true).

Edoardo Guerriero
  • 1,210
  • 7
  • 16
  • 1
    Hey! Thanks for linking the paper! – user3741951 Mar 30 '20 at 01:05
  • Also, I know you mentioned above you are not entirely confident about the reason, but just to confirm do you know why / how P_text(e/m) from the paper is being represented by the similarity in spacy scoring (i.e. xp.dot(entity_encodings, sentence_embedding_t) / ( sentence_norm * entity_norm)? – user3741951 Mar 30 '20 at 01:25
  • The context probability in spaCy is measured as the cosine similarity between the sentence encoding and the entity encoding. The entity encoding is the encoding of the Wikidata description of a particular entity in the KB. So basically it looks at how similar a sentence surrounding an entity is to the description of that entity. – Sofie VL Mar 30 '20 at 07:13
  • 1
    With respect to the original question: it is indeed taken from that paper Edoardo linked. It's basically just an instance of the General Addition Rule: P(A∪B)=P(A)+P(B)−P(A∩B). – Sofie VL Mar 30 '20 at 07:24
  • Thanks @SofieVL for the extra info! Just another quick question, which I think might be still a source of doubts for user3741951 as well: how can they assume cosine similarity to be a probability? I am aware that in some cases, e.g. with vectors derived through tf-idf the similarity score will be constrained between 0 and 1 (cause vectors can't be farther than 90°) but here they're calculating distance between hidden representations of an encoder, how can they be sure the final score will be in range 0-1? – Edoardo Guerriero Mar 30 '20 at 11:29
  • @edoardo-guerriero: the formula divides by the norm's of the vectors, right? – Sofie VL Mar 30 '20 at 12:14
  • @SofieVL yes but it doesn't get rid of potential negative values, am I missing something really trivial here? – Edoardo Guerriero Mar 30 '20 at 12:24