4

I am new to Python and have been trying to find out bag of words. I used vectorizer.fit_transform function as follows

vectorizer = CountVectorizer(vocabulary=set_of_words, tokenizer=nltk.word_tokenize)
bag_of_words = vectorizer.fit_transform(doc).toarray().astype(np.float64)

where doc contains the text whose bag of words is to be extracted.

and i get a warning as follows:/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2499: hereVisibleDeprecationWarning:rankis deprecated; use thendimattribute or function instead. To find the rank of a matrix seenumpy.linalg.matrix_rank`. VisibleDeprecationWarning)

On displaying vectorizer I get something like this

CountVectorizer(analyzer=u'word', binary=False, charset=None,
    charset_error=None, decode_error=u'strict',
    dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
    lowercase=True, max_df=1.0, max_features=None, min_df=1,
    ngram_range=(1, 1), preprocessor=None, stop_words=None,
    strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b',
    tokenizer=<function word_tokenize at 0xafbc6f4>,
    vocabulary=[u'dissolution', u'comparatively', u'desirable', u'four', u'obstruction', u'nursery', u'perverted', u'appetite', u'repress', u'consider'])
halfer
  • 19,824
  • 17
  • 99
  • 186
athira
  • 167
  • 1
  • 2
  • 10

3 Answers3

3

From TessellatingHeckler's link I found out: "It's one of those scipy calling old numpy function things"

Running pip install --upgrade scipy resolved this issue for me.

kilojoules
  • 9,768
  • 18
  • 77
  • 149
2

Are you using Scipy / Scite and hitting this bug https://github.com/scikit-learn/scikit-learn/issues/3866 ?

TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
  • es I tried to incorporate changes in the files of scipy as mentioed in the link provided above. But now what i get is ImportError: cannot import name __check_build – athira Dec 01 '14 at 05:02
1

The issue has been resolved.. The modifications mentioned in https://github.com/scipy/scipy/commit/fa1782e04fdab91f672ccf7a4ebfb887de50f01c when made in the scipy files solved the problem.

athira
  • 167
  • 1
  • 2
  • 10