How to find similar words with FastText?

Question

I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.

For eg: model.similar_word("dog") -> dogs. But there is no function built-in.

If I type model["dog"]

I only get the vector, that might be used to compare cosine similarity. model.cosine_similarity(model["dog"], model["dogs"]]).

Do I have to make some sort of loop and do cosine_similarity on all possible pairs in a text? That would take time ...!!!

When fasttext.skipgram('train.txt','model') is run, it creates a .bin & .vec file. Use these generated files and follow the process mentioned in the accepted answer. — Prometheus, Apr 11 '19 at 11:29
Nope. Have never touched Java. However FYI, the .bin and .vec files are cross compatible. — Prometheus, Nov 12 '19 at 06:22

score 16 · Accepted Answer · answered Feb 15 '17 at 18:36

16

Use Gensim, load fastText trained .vec file with load.word2vec models and use most_similiar() method to find similar words!

answered Feb 15 '17 at 18:36

Snehal

748
1
7
25

Is their any API in fasttext that allows one to input two words and then returns their cosine similarity? Say something like (car,vehicle) and then returns something like 0.8? – kzs Dec 20 '18 at 01:15

score 10 · Answer 2 · answered Sep 18 '19 at 14:54

You can install pyfasttext library to extract the most similar or nearest words to a particualr word.

from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)

Or you can get the latest development version of fasttext, you can install from the github repository :

import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)

score 7 · Answer 3 · answered Jul 08 '18 at 01:29

You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText.

Use this:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)

And by topn parameter you get the top 10 most similar words.

score 6 · Answer 4 · edited Jan 24 '18 at 13:39

6

You should use gensim to load the model.vec and then get similar words:

m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)

edited Jan 24 '18 at 13:39

Andrew Svetlov

16,730
8
66
69

answered Feb 14 '17 at 09:50

far-zadeh

135
7

score 2 · Answer 5 · answered Jan 03 '21 at 02:39

2

Use gensim,

from gensim.models import FastText

model = FastText.load(PATH_TO_MODEL)
model.wv.most_similar(positive=['dog'])

More info here

answered Jan 03 '21 at 02:39

ChiaChong Lau

21
1

score 0 · Answer 6 · answered Apr 07 '22 at 10:16

0

Fasttext has a method called get_nearest_neighbors. nearest neighbor queries. One needs the model's .bin file to use this.

answered Apr 07 '22 at 10:16

mejobhoot

71
8

How to find similar words with FastText?

6 Answers6

Linked