5

I'm processing several thousand XML files, and wish to do it with use ProcessPoolExecutor from concurrent.futures.

The sole documentation example assume that you wait for all the results before continuing (as do all concurrent.futures tutorials I've found), but that's not feasible in this case as I'll run out of RAM long before the processing is complete.

One option is to batch the work up using something like this - http://grokbase.com/t/python/python-list/1458ss5etz/real-world-use-of-concurrent-futures#20140508uqijcqm5ddbvxhq4xpkqaw3ebi, and I've used that method on a previous project.

But I think an even better option would be to send the process results (a dict of various data) to a separate (single) post-processing process. In this case post-processing needs to write the results to a SQLite database file, but if it could be a separate single dedicated process, it would have exclusive control of the file, and thus eliminate the risk of deadlocks, file corruption etc.

I know this is possible with multiprocessing using Queues per this excellent answer - Python multiprocessing safely writing to a file - can this be done with concurrent.futures, and if so, how?

(I'm using Python 3.6)

GIS-Jonathan
  • 4,347
  • 11
  • 31
  • 45

0 Answers0