I am using the Stanza package in Python to perform language processing tasks on text data. My code is as follows:
nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma', download_method=None)
However, I encountered the following error message when running my code:
_pickle.UnpicklingError: pickle data was truncated
from
File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/core.py", line 296, in __init__
self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/processor.py", line 193, in __init__
self._set_up_model(config, pipeline, device)
File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/pos_processor.py", line 30, in _set_up_model
self._trainer = Trainer(pretrain=self.pretrain, model_file=config['model_path'], device=device, args=args, foundation_cache=pipeline.foundation_cache)
File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/pos/trainer.py", line 32, in __init__
self.load(model_file, pretrain, args=args, foundation_cache=foundation_cache)
File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/pos/trainer.py", line 117, in load
emb_matrix = pretrain.emb
File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/common/pretrain.py", line 50, in emb
self.load()
File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/common/pretrain.py", line 56, in load
data = torch.load(self.filename, lambda storage, loc: storage)
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load
result = unpickler.load()
I suspect that this error is related to the pickling process used by the Stanza package to store some of its data by calling Pytorch's serialization method, but have no idea whether should I dive into the package code and fix where. Can anyone help me understand what might be causing this error, and how I can fix it?
I searched on the internet and find no matching solutions.
btw the stanza package was downloaded by hand file by file from the huggingface website. I fixed several former file missing errors and then met this one.