2

I've been playing around with analogy queries over some publicly available word embeddings, in particular using the following:

I'm doing some basic queries that include (where queryTarget is what I am looking for):

baseSource:baseTarget :: querySource:queryTarget e.g. man:woman :: king:queen

  • maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource)
  • maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource) * cosine_similarity(baseTarget-queryTarget,baseSource-querySource)
  • minimize L2norm(baseTarget-baseSource+querySource, queryTarget)

For the query: man:woman :: king:?

The glove data gives me the correct queen, lady, princess results for the various matching strategies. However, conceptnet gives female_person, adult_female, king_david's_harp as top 3, which I would not expect (queen is not in the top 20). Similarly, I see poor results regularly displace expected results that I do see in the glove results.

Does the conceptnet embedding require some sort of additional tweaking before I can use it? Or is it just not tailored/suited for English analogies?

rhi
  • 303
  • 2
  • 9
  • 1
    Good question. I'm finding the same thing in that conceptnet numberbatch has strange things going on. Could it be to do with the weightings that they've introduced? (Described on github)? – aldorath Jan 20 '21 at 05:23

0 Answers0