3

I'm new to these frameworks as well as NLP. I am following an example which gives me the following code snippet to calculate the tf-idf score of all the tokens in the tweets. However I keep getting either import errors or Vectorizer undefined.

Code:

import spacy
 from textacy.vsm import Vectorizer
 import textacy.vsm
 vectorizer = Vectorizer(weighting = 'tfidf')
 term_matrix = vectorizer.fit_transform([tok.lemma_ for tok in doc] for doc 
 in spacy_tweets)

Errors Recieved:

from textacy.vsm import Vectorizer
ImportError: cannot import name 'Vectorizer
//
import textacy
vectorizer = textacy.Vectorizer(weighting='tfidf')
AttributeError: module 'textacy' has no attribute 'Vectorizer'


//
   import textacy
   vectorizer = Vectorizer(weighting='tfidf')
   NameError: name 'Vectorizer' is not defined

My Enviroment

operating system: windows 10 64bit
python version: Python 3.6.4 :: Anaconda, Inc.
spacy version: 1.9.0-np111py36_vc14_1 installed
spacy models: en_core_web_sm 
textacy version: 0.3.4-py36_0

What is the correct import statement to access the textacy vectorizer class?

Tiago Duque
  • 1,956
  • 1
  • 12
  • 31
aldmarj
  • 61
  • 1
  • 7

1 Answers1

3

When using conda, version 0.3.4 of textacy is installed. This version does not have the the vectorizer. Instead install it through the PyPi project.

https://pypi.org/project/textacy/

to check if you have the vectorizer you can do the following:

In [1]: import textacy

In [2]: dir(textacy)
Out[2]:
['Corpus',
'Doc',
'TextStats',
'TopicModel',
'Vectorizer',
'__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'__version__',
'about',
'absolute_import',
'cache',
'compat',
'constants',
'corpus',
'data_dir',
'doc',
'extract',
'io',
'load_spacy',
'logger',
'logging',
'network',
'os',
'preprocess',
'preprocess_text',
'spacy_utils',
'text_stats',
'text_utils',
'tm',
'utils',
'viz',
'vsm']
dimid
  • 7,285
  • 1
  • 46
  • 85
aldmarj
  • 61
  • 1
  • 7