0

I'm using the dateparser library to parse some strings and return potential dates. I need to use cloudpickle for distributed use but am receiving an error:

import dateparser

class DateParser:

    def __init__(self,
                 threshold: float = 0.5,
                 pos_label: str = 'date'):

        self.threshold = threshold
        self.pos_label = pos_label


    def __call__(self):
        dateparser.parse('20/12/2022')

date_parser = DateParser()
with open('/path/parser.cloudpickle', 'wb+') as fout:
    cloudpickle.dump(date_parser, fout, protocol=4)


TypeError: can't pickle _thread.lock objects

However when i use plain pickle it works just fine:

import pickle
with open('/path/parser.pickle', 'wb+') as fout:
    pickle.dump(date_parser, fout, protocol=4)


# also loads just fine:
with open('/path/parser.pickle', 'rb+') as fin:
   pickle.load(fin)

I can get around this issue by importing dateparser in the init of Dateparser but I'm not sure why this should be the fix.

class DateParser:

    def __init__(self,
                 threshold: float = 0.5,
                 pos_label: str = 'date'):
        
        import dateparser

        self.threshold = threshold
        self.pos_label = pos_label

I looked online and it seems this threadlock complaint is most common to multiprocessing calls but as far as i can tell this doesn't happen in the underlying dateparser library. And this should've broken plain pickling anyway?

Drivebyluna
  • 344
  • 2
  • 14

0 Answers0