1

There is ONE word not being recognized as stopword, despite being on the list. I'm working with spacy 2.0.11, python 3.7, conda env, Debian 9.5

import spacy
from spacy.lang.es.stop_words import STOP_WORDS
nlp = spacy.load('es', disable=['tagger', 'parser', 'ner'])
STOP_WORDS.add('y')

Doing some tests:

>>> word = 'y'
>>> word in STOP_WORDS
True
>>> nlp(word)[0].is_stop
False
>>> len(STOP_WORDS)
305
>>> [word for word in STOP_WORDS if not nlp(word)[0].is_stop]
['y']

So, from 305 listed in STOP_WORDS, one is not flagged as such. I don't know what I'm doing wrong... Maybe it's a bug?

Susensio
  • 820
  • 10
  • 19

1 Answers1

0

It turns out I wasn't adding the word correctly, according to this answer

word = 'y'
spacy.lang.es.stop_words.STOP_WORDS.add(word)
nlp.vocab[word].is_stop = True

That solved the problem


Old answer, didn't solve the problem

I found the cause.

I was getting a warning on importing spacy:

RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility.

Apparently this indicates a version mismatch. More info

These warnings are visible whenever you import scipy (or another package) that was compiled against an older numpy than is installed.

In spaCy GitHub a numpy version was suggested. This ended up solving the problem:

conda install numpy=1.14.5

Susensio
  • 820
  • 10
  • 19