Use the "secret sauce" of CPython threading -- Queues!
Writing to a file is inherently sequential, so you might as well put one single thread in charge of all the writing.
Have all the worker threads push their results into a common output queue.
Have the single writer thread read from this output queue and
write to the csv every 1000 entries or when all the worker threads are done.
By doing it this way you avoid the headache of needing locks, or merging partial files afterwards.
Here is the basic structure I am suggesting. It creates 2500 entries, precesses them with 5 threads, and outputs after every 10 results:
import queue
import threading
SENTINEL = None
def worker(in_queue, out_queue):
for n in iter(in_queue.get, SENTINEL):
# print('task called: {n}'.format(n=n))
out_queue.put(n*2)
def write(out_queue, chunksize=10):
results = []
for n in iter(out_queue.get, SENTINEL):
results.append(n)
if len(results) >= chunksize:
print(results)
results = []
if len(results):
# SENTINEL signals the worker threads are done.
# print the remainder of the results
print(results)
in_queue = queue.Queue()
out_queue = queue.Queue()
num_threads = 5
N = 2500
for i in range(N):
in_queue.put(i)
for i in range(num_threads):
# ad a SENTINEL to tell each worker to end
in_queue.put(SENTINEL)
writer = threading.Thread(target=write, args=(out_queue,))
writer.start()
threads = [threading.Thread(target=worker, args=(in_queue, out_queue))
for n in range(num_threads)]
for t in threads:
t.start()
for t in threads:
t.join()
# tell the writer to end
out_queue.put(SENTINEL)
writer.join()
which prints
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[20, 22, 24, 26, 28, 30, 32, 34, 36, 38]
[40, 42, 44, 46, 48, 50, 52, 54, 56, 58]
...
[4940, 4942, 4944, 4946, 4948, 4950, 4952, 4954, 4956, 4958]
[4960, 4962, 4964, 4966, 4968, 4970, 4972, 4974, 4976, 4978]
[4980, 4982, 4984, 4986, 4988, 4990, 4992, 4994, 4996, 4998]
Note that the values printed may not appear in sorted order. It depends on the order in which the concurrent threads push results into out_queue
.