1

I am using the Stanza package in Python to perform language processing tasks on text data. My code is as follows:

nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma', download_method=None)

However, I encountered the following error message when running my code:

_pickle.UnpicklingError: pickle data was truncated

from

  File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/core.py", line 296, in __init__
    self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/processor.py", line 193, in __init__
    self._set_up_model(config, pipeline, device)
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/pipeline/pos_processor.py", line 30, in _set_up_model
    self._trainer = Trainer(pretrain=self.pretrain, model_file=config['model_path'], device=device, args=args, foundation_cache=pipeline.foundation_cache)
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/pos/trainer.py", line 32, in __init__
    self.load(model_file, pretrain, args=args, foundation_cache=foundation_cache)
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/pos/trainer.py", line 117, in load
    emb_matrix = pretrain.emb
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/common/pretrain.py", line 50, in emb
    self.load()
  File "/root/miniconda3/lib/python3.8/site-packages/stanza/models/common/pretrain.py", line 56, in load
    data = torch.load(self.filename, lambda storage, loc: storage)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/serialization.py", line 787, in _legacy_load
    result = unpickler.load()

I suspect that this error is related to the pickling process used by the Stanza package to store some of its data by calling Pytorch's serialization method, but have no idea whether should I dive into the package code and fix where. Can anyone help me understand what might be causing this error, and how I can fix it?

I searched on the internet and find no matching solutions.

btw the stanza package was downloaded by hand file by file from the huggingface website. I fixed several former file missing errors and then met this one.

Ruoxi NING
  • 11
  • 1

0 Answers0