I have a bit of a weird one I can't seem to get to the bottom of. I am testing a new lemmatizer for an NLP project, and it works great in the test Jupyter I was using, but as soon as I copy it over to a .py file for production, it raises a StopIteration. Any tips or suggestions on where to look? I have spent far too long trying to produce work arounds, all to no avail. I am using the exact same test dataset for both, so it is not a difference in data frames, both are using the same environment, and ALL code is the exact same.
Thanks in advance!
Here is the function:
def prepareStringTEST(x):
error = 'Error'
x = re.sub(r"[^0-9a-z]", " ", x)
if len(x)==0:
return ''
return " ".join([lemma(wd) for wd in x.split()])
and here is how it is being called:
df['text_cleaned_test'] = df['text'].apply(lambda x: prepareStringTEST(x))
Here is the error message:
Traceback (most recent call last):
File "C:\Users\xxx\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 609, in _read
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "z:\CEC Python\NLP\clean_raw_text_new.py", line 138, in <module>
df['text_cleaned_test'] = df['text'].apply(lambda x: prepareStringTEST(x))
File "C:\Program Files\Python39\lib\site-packages\pandas\core\series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "z:\CEC Python\NLP\clean_raw_text_new.py", line 138, in <lambda>
df['text_cleaned_test'] = df['text'].apply(lambda x: prepareStringTEST(x))
File "z:\CEC Python\NLP\clean_raw_text_new.py", line 75, in prepareStringTEST
return " ".join([lemma(wd) for wd in x.split()])
File "z:\CEC Python\NLP\clean_raw_text_new.py", line 75, in <listcomp>
return " ".join([lemma(wd) for wd in x.split()])
File "C:\Users\xxx\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 2172, in lemma
self.load()
File "C:\Users\xxx\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 2127, in load
for v in _read(self._path):
RuntimeError: generator raised StopIteration
Here is some code to test:
def prepareStringTEST(x):
error = 'Error'
x = re.sub(r"[^0-9a-z]", " ", x)
if len(x)==0:
return ''
return " ".join([lemma(wd) for wd in x.split()])
string = ''''
Peter Navarro, who as a White House adviser to President Donald J. Trump worked to keep Mr. Trump in office after his defeat in the 2020 election, disclosed on Monday that he has been summoned to testify on Thursday to a federal grand jury and to provide prosecutors with any records he has related to the attack on the Capitol last year, including “any communications” with Mr. Trump.
The subpoena to Mr. Navarro — which he said the F.B.I. served at his house last week — seeks his testimony about materials related to the buildup to the Jan. 6 attack on the Capitol, and signals that the Justice Department investigation may be progressing to include activities of people in the White House.
Mr. Navarro revealed the existence of the subpoena in a draft of a lawsuit he said he is preparing to file against the House committee investigating the Jan. 6 attack, Speaker Nancy Pelosi and Matthew M. Graves, the U.S. attorney for the District of Columbia.
'''
print(prepareStringTEST(string))
Here are my results in Jupyter (in VS code):
peter navarro who a a white house adviser to president donald j trump work to keep mr trump in office after hi defeat in the 2020 election disclose on monday that he have be summons to testify on thursday to a federal grand jury and to provide prosecutor with any record he have relate to the attack on the capitol last year include any communication with mr trump the subpoena to mr navarro which he say the f b i serve at hi house last week seek hi testimony about material relate to the buildup to the jan 6 attack on the capitol and signal that the justice department investigation may be progress to include activity of people in the white house mr navarro reveal the existence of the subpoena in a draft of a lawsuit he say he be prepare to file against the house committee investigate the jan 6 attack speaker nancy pelosi and matthew m grave the u attorney for the district of columbia
Here are my results running the exact same code in a .py file (in VS code)
PS Z:\CEC Python> & "C:/Program Files/Python39/python.exe" "z:/CEC Python/NLP/clean_raw_test_new.py"
Traceback (most recent call last):
File "C:\Users\mkzou183\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 609, in _read
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "z:\CEC Python\NLP\clean_raw_test_new.py", line 31, in <module>
print(prepareStringTEST(string.lower()))
File "z:\CEC Python\NLP\clean_raw_test_new.py", line 22, in prepareStringTEST
return " ".join([lemma(wd) for wd in x.split()])
File "z:\CEC Python\NLP\clean_raw_test_new.py", line 22, in <listcomp>
return " ".join([lemma(wd) for wd in x.split()])
File "C:\Users\mkzou183\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 2172, in lemma
self.load()
File "C:\Users\mkzou183\AppData\Roaming\Python\Python39\site-packages\pattern\text\__init__.py", line 2127, in load
for v in _read(self._path):
RuntimeError: generator raised StopIteration