I am trying to process some files using python, however as file number is huge it's taking too much time. I am trying to create multiple threads and wants to execute this thing in parallel to cut down some time. However not sure exactly how to do it.
I have write the following code which is suppose to execute 10 files in parallel, but it seems like rather than creating 10 threads it's creating 100 threads, one for each file.
def setup_logging():
log_formatter = logging.Formatter('%(asctime)s [%(threadName)s] [%(levelname)s] %(message)s')
root_logger = logging.getLogger()
file_handler = logging.FileHandler("./logs.log")
file_handler.setFormatter(log_formatter)
root_logger.addHandler(file_handler)
console_handler = logging.StreamHandler()
console_handler.setFormatter(log_formatter)
root_logger.addHandler(console_handler)
root_logger.level = logging.DEBUG
def print_file_name(name):
logging.info(name)
if __name__ == '__main__':
setup_logging()
logging.info("hi")
dir_name = "/home/egnyte/demo/100"
file_list = os.listdir(dir_name)
threads = []
import threading
for i in range(0, len(file_list), 10):
for index in range(0, 10, 1):
t = threading.Thread(target=print_file_name, args=(file_list[i+index],))
threads.append(t)
t.start()
for t in threads:
t.join()
Now the problem is, in logs I am able to see following lines, which makes me think it's creating more than 10 thread, actually 1 for every file. And that's not what I want.
2017-03-30 13:16:46,120 [Thread-9] [INFO] demo_69.txt
2017-03-30 13:16:46,120 [Thread-10] [INFO] demo_45.txt
2017-03-30 13:16:46,121 [Thread-11] [INFO] demo_72.txt
2017-03-30 13:16:46,121 [Thread-12] [INFO] demo_10.txt
...
...
2017-03-30 13:16:46,149 [Thread-98] [INFO] demo_29.txt
2017-03-30 13:16:46,150 [Thread-99] [INFO] demo_27.txt
2017-03-30 13:16:46,150 [Thread-100] [INFO] demo_39.txt
I tried using multi process as well, however seems like it's not creating any thread, all the file name are being printed using main thread only.
pool = multiprocessing.Pool(processes=10) result_list = pool.map(print_file_name, (file for file in os.listdir(dir_name)))