Why is ConceptNet Numberbatch word embedding giving poor results for analogy queries?

Question

I've been playing around with analogy queries over some publicly available word embeddings, in particular using the following:

numberbatch-en-19.08 from https://github.com/commonsense/conceptnet-numberbatch
glove.42B.300d from https://nlp.stanford.edu/projects/glove/
glove.840B.300d from https://nlp.stanford.edu/projects/glove/

I'm doing some basic queries that include (where queryTarget is what I am looking for):

baseSource:baseTarget :: querySource:queryTarget e.g. man:woman :: king:queen

maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource)
maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource) * cosine_similarity(baseTarget-queryTarget,baseSource-querySource)
minimize L2norm(baseTarget-baseSource+querySource, queryTarget)

For the query: man:woman :: king:?

The glove data gives me the correct queen, lady, princess results for the various matching strategies. However, conceptnet gives female_person, adult_female, king_david's_harp as top 3, which I would not expect (queen is not in the top 20). Similarly, I see poor results regularly displace expected results that I do see in the glove results.

Does the conceptnet embedding require some sort of additional tweaking before I can use it? Or is it just not tailored/suited for English analogies?

Good question. I'm finding the same thing in that conceptnet numberbatch has strange things going on. Could it be to do with the weightings that they've introduced? (Described on github)? — aldorath, Jan 20 '21 at 05:23

Why is ConceptNet Numberbatch word embedding giving poor results for analogy queries?

0 Answers0