14

I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.

For eg: model.similar_word("dog") -> dogs. But there is no function built-in.

If I type model["dog"]

I only get the vector, that might be used to compare cosine similarity. model.cosine_similarity(model["dog"], model["dogs"]]).

Do I have to make some sort of loop and do cosine_similarity on all possible pairs in a text? That would take time ...!!!

Prometheus
  • 1,148
  • 14
  • 21
Isbister
  • 906
  • 1
  • 12
  • 30
  • When fasttext.skipgram('train.txt','model') is run, it creates a .bin & .vec file. Use these generated files and follow the process mentioned in the accepted answer. – Prometheus Apr 11 '19 at 11:29
  • @Prometheus Any ideas how to do something similar in Java? – Ali Nov 11 '19 at 21:06
  • Nope. Have never touched Java. However FYI, the .bin and .vec files are cross compatible. – Prometheus Nov 12 '19 at 06:22

6 Answers6

16

Use Gensim, load fastText trained .vec file with load.word2vec models and use most_similiar() method to find similar words!

Snehal
  • 748
  • 1
  • 7
  • 25
  • Is their any API in fasttext that allows one to input two words and then returns their cosine similarity? Say something like (car,vehicle) and then returns something like 0.8? – kzs Dec 20 '18 at 01:15
10

You can install pyfasttext library to extract the most similar or nearest words to a particualr word.

from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)

Or you can get the latest development version of fasttext, you can install from the github repository :

import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)
Kalana Geesara
  • 141
  • 1
  • 3
7

You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText.

Use this:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)

And by topn parameter you get the top 10 most similar words.

6

You should use gensim to load the model.vec and then get similar words:

m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)
Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69
far-zadeh
  • 135
  • 7
2

Use gensim,

from gensim.models import FastText

model = FastText.load(PATH_TO_MODEL)
model.wv.most_similar(positive=['dog'])

More info here

0

Fasttext has a method called get_nearest_neighbors. nearest neighbor queries. One needs the model's .bin file to use this.

enter image description here

mejobhoot
  • 71
  • 8