-1

After executing this code I dont get any error, but how to print the Dictionary variable after the multiprocessing is done. df_Store is a dataframe that contains 3 columns - StoreID, Latitude, Longitude. After doing this is the error that I get-BrokenPipeError: [Errno 32] Broken pipe

df_Store = pd.read_parquet(r'C:\Users\Store_Table .parquet',engine = 'auto', columns=None )
Lat_ary = df_Store['Latitude'].tolist()
Long_ary = df_Store['Longitude'].tolist()
col = list(zip(Lat_ary,Long_ary ))
df_Store['Lat_Long']= col

from haversine import haversine
import multiprocessing
import time
start = time.perf_counter()
def proximity_store(d, df_Store):
    d={};
    for i in range(len(df_Store)):
        for j in range(len(df_Store)):
            if df_Store.StoreID[i]==df_Store.StoreID[j]:
                pass
            else:
                haversine(df_Store.Lat_Long[i], df_Store.Lat_Long[j])
                d[df_Store.StoreID[i],df_Store.StoreID[j]] = haversine
    return d
if __name__ == '__main__':
    d = multiprocessing.Manager().dict()
    p1 = multiprocessing.Process(target=proximity_store, args=[d, df_Store])
    p1.start()
    p1.join()
    
    finish=time.perf_counter()
    print(f'Finished in {round(finish-start,2)}second(s)')
    print(d)
  • 1
    `print(Dictionary)` works, though of course it can be pretty-printed in various ways. Does that answer your question? – John Coleman Nov 11 '20 at 17:41
  • Your example contains the variables `df_Store` and `haversine`, but these have not been defined, so I can't run the above code snipped on my PC (also you didn't include the import statements for `time` and `multiprocessing`, although these are easier to fix). It would be easier to answer the question if you included a minimal, reproducible example that others can run on their PC without having to make edits – Jake Levi Nov 11 '20 at 17:44
  • print(Dictionary) in place of return Dictionary does not print anything – Mrinal roy Nov 11 '20 at 17:45
  • print(Dictionary) after p1.join() prints {} with no values – Mrinal roy Nov 11 '20 at 17:46
  • 1
    If `print(Dictionary)` prints `{}` then you have an empty dictionary. Note that `return Dictionary` does *not* modify the global variable `Dictionary`. It seems like you have a basic variable scoping confusion. Your `global Dictionary` has no effect. Instead, you create and then discard a local dictionary. – John Coleman Nov 11 '20 at 17:52
  • How do I print the Dictionary values then? – Mrinal roy Nov 11 '20 at 17:54
  • @Mrinalroy see my answer below. Does that answer your question? – Jake Levi Nov 11 '20 at 18:01

1 Answers1

0

Here is a solution, using a shared dictionary (multiprocessing.Manager().dict()) instead of the standard built-in Python dictionary (see this other Stack Overflow question and answer for the reason why you shouldn't pass the built-in dictionary to the process). Also, instead of defining Dictionary as a global variable, I have created it as a local variable, and passed it as an argument to the process' target function (in general it is good practise to avoid global variables when you can).

Also, I have replaced the undefined variables with randomly generated integers, for demonstration purposes, so anyone can run this example by itself.

import time
import multiprocessing
import numpy as np

start = time.perf_counter()

def proximity_store(d):
    np.random.seed(0)
    len_df_Store = np.random.randint(5, 10)
    for i in range(len_df_Store):
        for j in range(len_df_Store):
            df_Store_StoreID_i = np.random.randint(5, 10)
            df_Store_StoreID_j = np.random.randint(5, 10)
            if df_Store_StoreID_i==df_Store_StoreID_j:
                pass
            else:
                haversine = np.random.randint(5, 10)
                d[df_Store_StoreID_i, df_Store_StoreID_j] = haversine
    return d

if __name__ == '__main__':
    np.random.seed(0)
    d = multiprocessing.Manager().dict()
    p1 = multiprocessing.Process(target=proximity_store, args=[d])
    p1.start()
    p1.join()
    
    finish=time.perf_counter()
    print(f'Finished in {round(finish-start,2)}second(s)')
    print(d)

Console output:

{(5, 8): 8, (8, 6): 6, (7, 9): 8, (5, 9): 5, (6, 5): 5, (9, 8): 8, (8, 5): 8, (5, 7): 8, (9, 5): 5, (6, 9): 7, (5, 6): 5, (7, 8): 8, (8, 9): 5, (7, 5): 6, (6, 8): 7, (9, 7): 5, (8, 7): 5, (7, 6): 7, (9, 6): 8, (6, 
7): 8}
Jake Levi
  • 1,329
  • 11
  • 16
  • Here df_Store is a dataframe that contains three columns . By going through what you mentioned the output is {} – Mrinal roy Nov 11 '20 at 18:15
  • In that case, have you tried replacing `def proximity_store(d):` with `def proximity_store(d, df_Store):`, and `p1 = multiprocessing.Process(target=proximity_store, args=[d])` with `p1 = multiprocessing.Process(target=proximity_store, args=[d, df_Store])`? – Jake Levi Nov 11 '20 at 18:18
  • As I said before, it's not easy to answer the question without having the full information to start with – Jake Levi Nov 11 '20 at 18:19
  • I think your problem might be the line `d={};` as the first command in the `proximity_store` function, this basically ignores the shared dictionary that's passed into the process function as an argument, and does all of the function operations on a new, non-shared dictionary. Try removing this line and try again? – Jake Levi Nov 12 '20 at 11:37
  • Also, as a side note, you don't need semicolons at the end of lines in Python code – Jake Levi Nov 12 '20 at 11:37