I have a dictionary of folder names that I would like to process in parallel. Under each folder, there is an array of file names that I would like to process in series:
folder_file_dict = {
folder_name : {
file_names_key : [file_names_array]
}
}
Ultimately, I will be creating a folder named folder_name
which contains the files with names len(folder_file_dict[folder_name][file_names_key])
. I have a method like so:
def process_files_in_series(file_names_array, udp_port):
for file_name in file_names_array:
time_consuming_method(file_name, udp_port)
# create "file_name"
udp_ports = [123, 456, 789]
Note the time_consuming_method()
above, which takes a long time due to calls over a UDP port. I am also limited to using the UDP ports in the array above. Thus, I have to wait for time_consuming_method
to complete on a UDP port before I can use that UDP port again. This means that I can only have len(udp_ports)
threads running at a time.
Thus, I will ultimately create len(folder_file_dict.keys())
threads, with len(folder_file_dict.keys())
calls to process_files_in_series
. I also have a MAX_THREAD count. I am trying to use the Queue
and Threading
modules, but I am not sure what kind of design I need. How can I do this using Queues and Threads, and possibly Conditions as well? A solution that uses a thread pool may also be helpful.
NOTE
I am not trying to increase the read/write speed. I am trying to parallelize the calls to time_consuming_method
under process_files_in_series
. Creating these files is just part of the process, but not the rate limiting step.
Also, I am looking for a solution that uses Queue
, Threading
, and possible Condition
modules, or anything relevant to those modules. A threadpool solution may also be helpful. I cannot use processes, only threads.
I am also looking for a solution in Python 2.7.