0

I've been learning NLP text classification via book "Text Analytics with Python". It's required several modules to be installed in a virtual environment. I use Anaconda env. I created a blank env with Python 3.7 and installed required pandas, numpy, nltk, gensim, sklearn... then, I have to install Pattern. The first problem is that I can't install Pattern via conda because of a conflict between Pattern and mkl_random.

(nlp) D:\Python\Text_classification>conda install -c mickc pattern
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - mkl_random
  - pattern
Use "conda info <package>" to see the dependencies for each package.

It's impossible to remove mkl_random because there're related packages: gensim, numpy, scikit-learn etc. I don't know what to do, I didn't find any suitable conda installations for Pattern that is accepted in my case. Then, I installed Pattern using pip. Installation was successful. Is it okay to have packages from conda and from pip at the same time?

The second problem, I think, is connected with the first one. I downloaded the book's example codes from https://github.com/dipanjanS/text-analytics-with-python/tree/master/Old-First-Edition/source_code/Ch04_Text_Classification, added brackets to Python 2.x 'print' functions and run classification.py The program raised an exception:

Traceback (most recent call last):
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 609, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "classification.py", line 50, in <module>
    norm_train_corpus = normalize_corpus(train_corpus)
  File "D:\Python\Text_classification\normalization.py", line 96, in normalize_corpus
    text = lemmatize_text(text)
  File "D:\Python\Text_classification\normalization.py", line 67, in lemmatize_text
    pos_tagged_text = pos_tag_text(text)
  File "D:\Python\Text_classification\normalization.py", line 58, in pos_tag_text
    tagged_text = tag(text)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 188, in tag
    for sentence in parse(s, tokenize, True, False, False, False, encoding, **kwargs).split():
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 169, in parse
    return parser.parse(s, *args, **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1172, in parse
    s[i] = self.find_tags(s[i], **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 114, in find_tags
    return _Parser.find_tags(self, tokens, **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1113, in find_tags
    lexicon = kwargs.get("lexicon", self.lexicon or {}),
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 376, in __len__
    return self._lazy("__len__")
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 368, in _lazy
    self.load()
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in load
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in <genexpr>
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
RuntimeError: generator raised StopIteration

I don't understand what is happening. Is the exception raised because my installation with pip, or the problem is in the wrong or deprecated code in the book... and is it possible to install Pattern in conda with all other necessary packages.

Thank you in advance!

Sergey V.
  • 37
  • 6
  • What did `conda info ` produce for each? What was the underlying dependency conflict? – scipilot Feb 24 '19 at 01:16
  • I had problems installing Pattern for Gensim last year because it wasn't available in pip3 yet. I ended up downloading a zip of the code and installing it manually. I think the Python3 port of Pattern wasn't finished or done properly. not sure of the status now. – scipilot Feb 24 '19 at 01:22
  • Some of the information here might help you https://stackoverflow.com/questions/34998210/how-do-i-pip-install-pattern-packages-in-python-3-5 – scipilot Feb 24 '19 at 01:23
  • Thank you scipilot. I think you're right about the Python version and the Pattern port. This book was written for Python 2.x, and it's just too outdated. – Sergey V. Feb 27 '19 at 15:12

1 Answers1

0

Switching to Python 3.6 solved this issue for me.

If you are using conda, first set up an environment, and specify that you want to use 3.6, and install any packages you need there.

conda create --name myenv python=3.6 pandas numpy gensim jupyter 
conda activate myenv

For some reason, I didn't need to install Pattern directly.

Related Gensim explanation: https://github.com/RaRe-Technologies/gensim/issues/2438