Questions tagged [gensim]

Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.

Resources and Tutorials:

2433 questions

145

votes

14 answers

How to calculate the sentence similarity using word2vec model of gensim with python

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity('woman', 'man') 0.73723527 However, the word2vec model fails to predict the sentence…

python gensim word2vec

asked Mar 02 '14 at 16:04

zhfkt

2,415
3
21
24

votes

10 answers

Convert word2vec bin file to text

From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly straightforward to convert the binary format to text…

python c gensim word2vec

asked Dec 05 '14 at 20:39

Glenn

6,455
4
33
42

votes

4 answers

Doc2vec: How to get document vectors

How to get document vectors of two text documents using Doc2vec? I am new to this, so it would be helpful if someone could point me in the right direction / help me with some tutorial I am using gensim. doc1=["This is a sentence","This is another…

python gensim word2vec

asked Jul 09 '15 at 14:57

bee2502

1,145
1
10
13

votes

1 answer

gensim Doc2Vec vs tensorflow Doc2Vec

I'm trying to compare my implementation of Doc2Vec (via tf) and gensims implementation. It seems atleast visually that the gensim ones are performing better. I ran the following code to train the gensim model and the one below that for tensorflow…

python tensorflow nlp gensim doc2vec

asked Oct 04 '16 at 03:13

sachinruk

9,571
12
55
86

votes

5 answers

gensim word2vec: Find number of words in vocabulary

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?

python neural-network nlp gensim word2vec

asked Feb 24 '16 at 07:39

hlin117

20,764
31
72
93

votes

6 answers

PyTorch / Gensim - How do I load pre-trained word embeddings?

I want to load a pre-trained word2vec embedding with gensim into a PyTorch embedding layer. How do I get the embedding weights loaded by gensim into the PyTorch embedding layer?

python pytorch neural-network gensim word-embedding

asked Apr 07 '18 at 18:21

MBT

21,733
19
84
102

votes

18 answers

gensim error: ImportError: No module named 'gensim'

I trying to import gensim with import gensim but get the following error ImportError Traceback (most recent call last) in () ----> 1 import gensim 2 model =…

python gensim word2vec

asked Sep 12 '17 at 05:33

woojung

votes

5 answers

How to create a word cloud from a corpus in Python?

From Creating a subset of words from a corpus in R, the answerer can easily convert a term-document matrix into a word cloud easily. Is there a similar function from python libraries that takes either a raw word textfile or NLTK corpus or Gensim…

python nltk corpus gensim word-cloud

asked May 20 '13 at 08:51

alvas

115,346
109
446
738

votes

4 answers

How to use Gensim doc2vec with pre-trained word vectors?

I recently came across the doc2vec addition to Gensim. How can I use pre-trained word vectors (e.g. found in word2vec original website) with doc2vec? Or is doc2vec getting the word vectors from the same sentences it uses for paragraph-vector…

python nlp gensim word2vec doc2vec

asked Dec 14 '14 at 15:13

Stergios

3,126
6
33
55

votes

4 answers

How to get tfidf with pandas dataframe?

I want to calculate tf-idf from the documents below. I'm using python and pandas. import pandas as pd df = pd.DataFrame({'docId': [1,2,3], 'sent': ['This is the first sentence','This is the second sentence', 'This is the third…

python pandas scikit-learn tf-idf gensim

asked Jun 02 '16 at 13:28

user1610952

1,249
1
16
31

votes

8 answers

How to check if a key exists in a word2vec trained model or not

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view". myModel["view"] However, I get a KeyError for the word…

python gensim word2vec

asked May 18 '15 at 11:24

London guy

27,522
44
121
179

votes

1 answer

Doc2Vec Get most similar documents

I am trying to build a document retrieval model that returns most documents ordered by their relevancy with respect to a query or a search string. For this I trained a doc2vec model using the Doc2Vec model in gensim. My dataset is in the form of a…

python nlp gensim doc2vec

asked Mar 14 '17 at 08:43

Clock Slave

7,627
15
68
109

votes

1 answer

How to extract phrases from corpus using gensim

For preprocessing the corpus I was planing to extarct common phrases from the corpus, for this I tried using Phrases model in gensim, I tried below code but it's not giving me desired output. My code from gensim.models import Phrases documents =…

python nlp gensim

asked Mar 01 '16 at 06:30

Prashant Puri

2,324
1
15
21

votes

6 answers

Update gensim word2vec model

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the model with that sentence so that querying it next…

gensim word2vec

asked Mar 01 '14 at 22:08

user2480542

2,845
4
24
25

votes

3 answers

Python Gensim: how to calculate document similarity using the LDA model?

I've got a trained LDA model and I want to calculate the similarity score between two documents from the corpus I trained my model on. After studying all the Gensim tutorials and functions, I still can't get my head around it. Can somebody give me a…

python nlp lda gensim

asked Mar 16 '14 at 06:51

still_st

2 3

…

99 100 Next