1

I have code using a ProcessPoolExecutor which can't pickle lambdas and functions. Some of the code that I want to execute in parallel uses a defaultdict with a default value of None.

How would you proceed? If at all possible, I would not like to touch the parallelizing code.

What I have:

class SomeClass:
    def __init__(self):
        self.some_dict = defaultdict(lambda: None)

    def generate(self):
        <some code>

def some_method_to_parallelize(x: SomeClass):
    <some code>

def some_method():
    max_workers = round(os.cpu_count() // 1.5)
    invocations_per_process = 100
    with ProcessPoolExecutor(max_workers=max_workers) as executor:    
        data = [executor.submit(some_method_to_parallelize, SomeClass())] for _ in range(invocations_per_process)]
        data = list(itertools.chain.from_iterable([r.result() for r in data]))
    
Niki
  • 738
  • 8
  • 17
  • May be related [Can't pickle defaultdict](https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict) – darw Nov 01 '20 at 03:05

1 Answers1

2

Try:

collections.defaultdict(type(None))

That gets you a reference to NoneType for use as your defaultdict's default factory. When constructed, it produces None, and unlike a lambda, appears to be picklable.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271