I have a function which reads in a file, compares a record in that file to a record in another file and depending on a rule, appends a record from the file to one of two lists.
I have an empty list for adding matched results to:
match = []
I have a list restrictions
that I want to compare records in a series of files with.
I have a function for reading in the file I wish to see if contains any matches. If there is a match, I append the record to the match
list.
def link_match(file):
links = json.load(file)
for link in links:
found = False
try:
for other_link in other_links:
if link['data'] == other_link['data']:
match.append(link)
found = True
else:
pass
else:
print "not found"
I have numerous files that I wish to compare and I thus wish to use the multiprocessing library.
I create a list of file names to act as function arguments:
list_files=[]
for file in glob.glob("/path/*.json"):
list_files.append(file)
I then use the map
feature to call the function with the different input files:
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=6)
pool.map(link_match,list_files)
pool.close()
pool.join()
CPU use goes through the roof and by adding in a print line to the function loop I can see that matches are being found and the function is behaving correctly.
However, the match
results list remains empty. What am I doing wrong?