How to get similar words in pre trained ELMO embedding?

Question

How to get similar word for a given word in the pre trained ELMO Embedding? For Example: In Glove, we have glove_model.most_similar() to find the most similar word and its embedding for any given word. Similarly do we have anything in ELMO?

score 1 · Answer 1 · answered Apr 17 '19 at 13:15

1

Unlike GloVe which has a separate entry for each word from a limited vocabulary, ELMo computes the word embeddings dynamically using a character-level CNN, so in theory, ELMo should be able to handle unlimited vocabulary. In practice, it only works well with words it encountered during training and words that are similar to them, but it still is able to obtain a vector or an arbitrary string.

So, it does not make much sense to have a method as GloVe does. You can, however, precompute vectors for a vocabulary you are interested and implement nearest neighbor search yourself, e.g., using scipy.spatial.cKDTree.

answered Apr 17 '19 at 13:15

Jindřich

10,270
2
23
44

1

KDTree will help us to find the nearest neighbor given the new point and existing points are of same dimension. ELMO encoding for the word "Display" and "Screen" are of dimension (7,1024) and (6,1024) respectively. The dimension of the ELMO encoding is dependent on the number of characters in the word as ELMO does character level CNN, hence comparison of any ELMO embedded words with other ELMO embedded words is not possible as the dimension of those two words vary. KDTree will not help us in finding the nearest neighbor due to varying dimension of the embedded words. – Anvitha Apr 19 '19 at 09:26
1

The output of the CNN is max-pooled into a single vector: https://github.com/allenai/allennlp/blob/master/allennlp/modules/elmo.py#L376 which is followed by some highway layers: https://github.com/allenai/allennlp/blob/master/allennlp/modules/elmo.py#L384 So, at the end, there is a vector per token that is fed into the LSTM. You might have a tokenization problem when you get shapes like this. – Jindřich Apr 19 '19 at 12:22
I guess you need to create a pooling layer into a single vector of fixed dimension? – rjurney Aug 29 '20 at 07:32
If you want a single vector for a multiword input, then yes. This question was about isolated words for which ELMo returns a single vector (which already includes character-level pooling). – Jindřich Aug 29 '20 at 09:16
@Anvitha, this is brutally inefficient but might be functional depending on your needs: I wonder if you might use gensim and GloVe to find topn similar words for the word you want to replace (which you're familiar with but as you know the output is not contextual) and then query cosine distances for each of those words (as replacements in your sentence) in ELMo to find which of them work contextually. Anyone see issues with this? – aldorath Apr 12 '21 at 20:36

How to get similar words in pre trained ELMO embedding?

1 Answers1