I'm using the dateparser library to parse some strings and return potential dates. I need to use cloudpickle for distributed use but am receiving an error:
import dateparser
class DateParser:
def __init__(self,
threshold: float = 0.5,
pos_label: str = 'date'):
self.threshold = threshold
self.pos_label = pos_label
def __call__(self):
dateparser.parse('20/12/2022')
date_parser = DateParser()
with open('/path/parser.cloudpickle', 'wb+') as fout:
cloudpickle.dump(date_parser, fout, protocol=4)
TypeError: can't pickle _thread.lock objects
However when i use plain pickle it works just fine:
import pickle
with open('/path/parser.pickle', 'wb+') as fout:
pickle.dump(date_parser, fout, protocol=4)
# also loads just fine:
with open('/path/parser.pickle', 'rb+') as fin:
pickle.load(fin)
I can get around this issue by importing dateparser in the init of Dateparser but I'm not sure why this should be the fix.
class DateParser:
def __init__(self,
threshold: float = 0.5,
pos_label: str = 'date'):
import dateparser
self.threshold = threshold
self.pos_label = pos_label
I looked online and it seems this threadlock complaint is most common to multiprocessing calls but as far as i can tell this doesn't happen in the underlying dateparser library. And this should've broken plain pickling anyway?