0

I run this code in purpose to processing text before feed it in my model

and got RecursionError: maximum recursion depth exceeded in comparison

train_text is a python series with text stem is PorterStemmer object from nltk library

train_corpus = []
for i in range(0, len(train_text)):
    data = re.sub("[^a-zA-Z]", ' ', train_text[i]).lower().split()
    data = [ps.stem(word) for word in data if not word in set(stopwords.words("english"))]
    data = ' '.join(data)
    train_corpus.append(data)

RecursionError                            Traceback (most recent call last)
<ipython-input-25-4a8646f33f6f> in <module>()

     57 for i in range(0, len(train_text)):
     58     data = re.sub("[^a-zA-Z]", ' ', train_text[i]).lower().split()
---> 59     data = [ps.stem(word) for word in data if not word in set(stopwords.words("english"))]
     60     data = ' '.join(data)
     61     train_corpus.append(data)

<ipython-input-25-4a8646f33f6f> in <listcomp>(.0)
     57 for i in range(0, len(train_text)):
     58     data = re.sub("[^a-zA-Z]", ' ', train_text[i]).lower().split()
---> 59     data = [ps.stem(word) for word in data if not word in set(stopwords.words("english"))]
     60     data = ' '.join(data)
     61     train_corpus.append(data)

~\Anaconda3\lib\site-packages\nltk\stem\porter.py in stem(self, word)
    665         stem = self._step1a(stem)
    666         stem = self._step1b(stem)
--> 667         stem = self._step1c(stem)
    668         stem = self._step2(stem)
    669         stem = self._step3(stem)
....

What can I do to solve this?

Thanks.

user9176398
  • 441
  • 1
  • 4
  • 15
  • 2
    It's unclear what your code is supposed to do, and all it does is throw a NameError because there are undefined variables. Please post a [mcve]. – Aran-Fey May 03 '18 at 11:24
  • What exactly do you have in `train_text`? This https://github.com/nltk/nltk/issues/1971 suggests that very long words will cause the recursion error. Have you tried with a short, simple training example? – Stuart May 03 '18 at 11:37
  • I had some text comment like that (this is a serie took from a dataframe) 0 Explanation\nWhy the edits made under my usern... 1 D'aww! He matches this background colour I'm s... 2 Hey man, I'm really not trying to edit war. It... – user9176398 May 03 '18 at 11:39
  • What is the maximum length of the strings in `train_text`? (You can check this using `print(max(len(s) for s in train_text))`) – Stuart May 03 '18 at 11:44
  • max length of the strings in my serie is 5000 – user9176398 May 03 '18 at 11:46
  • It seems likely to be due to something odd in the training text. Try inserting `print(data)` before the line where the error occurs, and see if there are very long strings in it? Try setting `train_text=["This is a simple test"]` and see if you still get the error? – Stuart May 03 '18 at 12:03
  • No error with "this is a simple test" – user9176398 May 03 '18 at 12:09

1 Answers1

0

It looks like it can be won by Docs: setrecursionlimit().
But remember, that recursion is not free - it consumes memory_of_function_consumes * amount_of_circles_of_recursion - so you can run out of memory when you have a huge amount of recursion runs. Thats why that limit is hardcoded in Python, and I think it is bad idea to overwrite it.

Chiefir
  • 2,561
  • 1
  • 27
  • 46