Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

125

votes

10 answers

ImportError: cannot import name 'joblib' from 'sklearn.externals'

I am trying to load my saved model from s3 using joblib import pandas as pd import numpy as np import json import subprocess import sqlalchemy from sklearn.externals import joblib ENV = 'dev' model_d2v = load_d2v('model_d2v_version_002', ENV) def…

asked May 19 '20 at 14:36

Praneeth Sai

1,421
2
7
11

votes

1 answer

gensim Doc2Vec vs tensorflow Doc2Vec

I'm trying to compare my implementation of Doc2Vec (via tf) and gensims implementation. It seems atleast visually that the gensim ones are performing better. I ran the following code to train the gensim model and the one below that for tensorflow…

python tensorflow nlp gensim doc2vec

asked Oct 04 '16 at 03:13

sachinruk

9,571
12
55
86

votes

4 answers

How to use Gensim doc2vec with pre-trained word vectors?

I recently came across the doc2vec addition to Gensim. How can I use pre-trained word vectors (e.g. found in word2vec original website) with doc2vec? Or is doc2vec getting the word vectors from the same sentences it uses for paragraph-vector…

python nlp gensim word2vec doc2vec

asked Dec 14 '14 at 15:13

Stergios

3,126
6
33
55

votes

1 answer

Doc2Vec Get most similar documents

I am trying to build a document retrieval model that returns most documents ordered by their relevancy with respect to a query or a search string. For this I trained a doc2vec model using the Doc2Vec model in gensim. My dataset is in the form of a…

python nlp gensim doc2vec

asked Mar 14 '17 at 08:43

Clock Slave

7,627
15
68
109

votes

2 answers

Is there pre-trained doc2vec model?

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

gensim doc2vec

asked Jul 02 '18 at 09:25

Idriss Brahimi

votes

3 answers

How to use TaggedDocument in gensim?

I have two directories from which I want to read their text files and label them, but I don't know how to do this via TaggedDocument. I thought it would work as TaggedDocument([Strings],[Labels]) but this doesn't work apparently. This is my code:…

python nltk gensim word2vec doc2vec

asked Jul 16 '17 at 06:35

Farhood

votes

2 answers

How does gensim calculate doc2vec paragraph vectors

i am going thorugh this paper http://cs.stanford.edu/~quocle/paragraph_vector.pdf and it states that " Theparagraph vector and word vectors are averaged or concatenated to predict the next word in a context. In the experiments, we use …

nlp vectorization gensim word2vec doc2vec

asked Nov 04 '16 at 01:18

jxn

7,685
28
90
172

votes

1 answer

How to break conversation data into pairs of (Context , Response)

I'm using Gensim Doc2Vec model, trying to cluster portions of a customer support conversations. My goal is to give the support team an auto response suggestions. Figure 1: shows a sample conversations where the user question is answered in the next…

python text-mining doc2vec gensym

asked Sep 14 '16 at 12:00

Shlomi Schwartz

8,693
29
109
186

votes

2 answers

Doc2Vec.infer_vector keeps giving different result everytime on a particular trained model

I am trying to follow the official Doc2Vec Gensim tutorial mentioned here - https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb I modified the code in line 10 to determine best matching document for the given…

nlp word2vec gensim doc2vec

asked Jan 21 '18 at 00:31

Rohan

votes

1 answer

How to use the infer_vector in gensim.doc2vec?

def cosine(vector1,vector2): cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2)) return cosV12 model=gensim.models.doc2vec.Doc2Vec.load('Model_D2V_Game') string='民生为了父亲我要坚强地 ...' list=string.split('…

python gensim doc2vec

asked Jul 09 '17 at 05:19

Jeffery

votes

2 answers

Why Doc2vec gives 2 different vectors for the same texts

I am using Doc2vec to get vectors from words. Please see my below code: from gensim.models.doc2vec import TaggedDocument f = open('test.txt','r') trainings = [TaggedDocument(words = data.strip().split(","),tags = [i]) for i,data in…

python nlp word2vec gensim doc2vec

asked May 16 '18 at 04:32

Thanh Bui

votes

1 answer

Improving Gensim Doc2vec results

I tried to apply doc2vec on 600000 rows of sentences: Code as below: from gensim import models model = models.Doc2Vec(alpha=.025, min_alpha=.025, min_count=1, workers = 5) model.build_vocab(res) token_count = sum([len(sentence) for sentence in…

python nlp gensim doc2vec

asked Dec 19 '17 at 15:20

Hackerds

1,195
2
16
34

votes

1 answer

what is the minimum dataset size needed for good performance with doc2vec?

How does doc2vec perform when trained on different sized datasets? There is no mention of dataset size in the original corpus, so I am wondering what is the minimum size required to get good performance out of doc2vec.

nlp doc2vec

asked Aug 30 '17 at 11:48

pete the dude

votes

1 answer

Doc2Vec Worse Than Mean or Sum of Word2Vec Vectors

I'm training a Word2Vec model like: model = Word2Vec(documents, size=200, window=5, min_count=0, workers=4, iter=5, sg=1) and Doc2Vec model like: doc2vec_model = Doc2Vec(size=200, window=5, min_count=0, iter=5, workers=4,…

python machine-learning gensim word2vec doc2vec

asked Jul 21 '17 at 09:40

ScientiaEtVeritas

5,158
4
41
59

votes

3 answers

Document similarity: Vector embedding versus Tf-Idf performance?

I have a collection of documents, where each document is rapidly growing with time. The task is to find similar documents at any fixed time. I have two potential approaches: A vector embedding (word2vec, GloVe or fasttext), averaging over word…

machine-learning nlp tf-idf word2vec doc2vec

asked Mar 07 '17 at 07:59

Alec Matusis

2 3

…

37 38 Next