doc=["This is a sentence","This is another sentence"]
documents=[doc.strip().split(" ") for doc in doc1 ]
model = doc2vec.Doc2Vec(documents, size = 100, window = 300, min_count = 10, workers=4)
I got AttributeError: 'list' object has no attribute 'words' because the input documents to the Doc2vec() was not in correct LabeledSentence format.
I hope this below example will help you understand the format.
documents = LabeledSentence(words=[u'some', u'words', u'here'], labels=[u'SENT_1'])
More details are here : http://rare-technologies.com/doc2vec-tutorial/
However, I solved the problem by taking input data from file using TaggedLineDocument().
File format: one document = one line = one TaggedDocument object.
Words are expected to be already preprocessed and separated by whitespace, tags are constructed automatically from the document line number.
sentences=doc2vec.TaggedLineDocument(file_path)
model = doc2vec.Doc2Vec(sentences,size = 100, window = 300, min_count = 10, workers=4)
To get document vector :
You can use docvecs. More details here : https://radimrehurek.com/gensim/models/doc2vec.html#gensim.models.doc2vec.TaggedDocument
docvec = model.docvecs[99]
where 99 is the document id whose vector we want. If labels are in integer format (by default, if you load using TaggedLineDocument() ), directly use integer id like I did. If labels are in string format,use "SENT_99" .This is similar to Word2vec