I have two dictionaries dict1 and dict2, with the key value pairs being dictionary and dataframes.
dict1 = {"A" : df1,"B" : df2,"C" : df3}
dict2 = {"A" : df4,"B" : df5,"C" : df6}
I want compare every row of df1['Last_Name']
with df4['Last_Name]
and create a new field df1['Match']
with the one with highest Levenstien distance. Similarly df2 with df5 and df3 with df6.
Now I want these 3 comparisons in parallel, I tried multiprocessing and concurrent.futures. But somehow it is not working.
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = []
for key1, df1 in dict1.items():
for key2, df2 in dict2.items():
futures.append(executor.submit(add_flag, df1, df2))
for future in concurrent.futures.as_completed(futures):
result = future.result()
key1 = next(iter(filter(lambda x: x in result.columns, dict1.keys())))
dict1[key1] = result