I am trying to make a very simple item recommender system using # of times items bought together,
so first I created a item2item dictionary of Counter like
# people purchased A with B 4 times, A with C 3 times.
item2item = {'A': {'B': 4, 'C': 3}, 'B': {'A': 4, 'C': 2}, 'C':{'A': 3, 'B': 2}}
# recommend user who purchased A and C
samples_list = [['A', 'C'], ...]
So for, samples = ['A', 'C'], I recommend maximum of item2item['A'] + item2item['C'].
However, merging is heavy for large matrix so I tried to use multi-processing as below
from operator import add
from functools import reduce
from concurrent.futures import ProcessPoolExecutor
from collections import Counter
with ProcessPoolExecutor(max_workers=10) as pool:
for samples in samples_list:
# w/o PoolExecutor
# combined = reduce(add, [item2item[s] for s in samples], Counter())
future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())
combined = future.result()
However, this didn't speed up the process at all.
I suspect that Counter in reduce function is not shared as answered in Python multiprocessing and a shared counter, and https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes.
Any help is appreciated.