This other post is exactly, I think, regarding what i want to do. Python multiprocessing pool.map for multiple arguments
How I am trying to implement is in my pseudo code:
called by another function in my code
def find_similar(db, num, listofsets):
#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets. each set is a set of strings
threshold = 0.49
similar_db_rows=[]
for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
#thing3 is a long string, each value separated by a comma
items = set(row[3].strip().split(','))
for set_item in listofsets:
sim_score = sim_function(set_item, items)
if sim_score<threshold:
similar_db_rows.append(row)
return similar_db_rows
def sim_function(x,y):
#x is a set, and y is a second set. The function does some calculation
and comparing, then returns a float value
return float_value
This works. What I was trying to do was use multiprocessing on the 2nd for loop. Instead of iterating each set (as my list of sets can have a lot, and this is a major bottle neck) and calling the function, I wanted to use multiprocessing so that it would call the function for these sets, passing a set along with the second constant argument from the sqlquery, many at a time, then return the resulting number from each set into a list. After all of the sets have been processed, then I can use the items in that list to check if any item meets a threshold.
I tried to use the func_star
and pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))
by Sebestian AND the `parmap' by zeehio. But for me, for example if I had 30 sets in the list, it was returning a list of results more than 30 times, each return it would check for similarity threshold, and appending rows, but never breaking out of this and I end up control Z'ing the whole thing.
Below is an example of what I attempted, first using parmap:
def find_similar(db, num, listofsets):
#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets. each set is a set of strings
threshold = 0.49
list_process_results=[]
similar_db_rows=[]
for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
items = set(row[3].strip().split(','))
list_process_results = parmap.starmap(sim_function, zip(listofsets), items)
print list_process_results
if any(t < threshold for t in list_process_results):
#print "appending a row"
similar_db_rows.append(row)
return similar_db_rows
and func_star
:
def func_star(a_b):
"""Convert `f([1,2])` to `f(1,2)` call."""
return sim_function(*a_b)
def find_similar(db, num, listofsets):
pool = Pool()
#db is the sqlite3 database
#num is a variable I need for the sql query
#listofsets - is a list of sets. each set is a set of strings
threshold = 0.49
list_process_results=[]
similar_db_rows=[]
for row in db.execute("SELECT thing1, thing2, thing3 FROM table WHERE num !={n};".format(n=num)):
items = set(row[3].strip().split(','))
list_process_results=pool.map(func_star, itertools.izip(listofsets, itertools.repeat(items ) ))
print list_process_results
if any(t < threshold for t in list_process_results):
#print "appending a row"
similar_db_rows.append(row)
return similar_db_rows
The same is happening for me with both, it goes on forever, returning a list of the # I am expecting (a different set of values each time), "appending a row", and never breaking out.
Thanks for the help!!! extra is if multiprocessing can also be used for the results of the row query (the outer loop) but i will first conquer the inner loop
To answer Dano question about find_similar
()---
I have another function that has a for loop. Each iteration of this for loop calls find_similar
. When the resulting list is returned from find_similar
, it prints the length of the list return, it then proceeds to finish the remainder of the loop, and go to the next for element. After this for loop is finished, the function is over, and find_similiar
is not called again.