3

Just started python the other day, new to the entire concept of multi-threading. I am having trouble with writing to a file while multi-threading. If I do it the regular way, it will constantly overwrite what's being written.

What is the proper way to write to a file while using say 5 threads?

nick
  • 51
  • 1
  • 4

2 Answers2

4

The best way as to not reduce performance is to use a queue between all threads, each thread would enque an item while a main thread would simply deque an item and write it to a file, the queue is thread safe and blocking when its empty, or better yet if possible, simply return all the values from the 5 threads and then write it to the file, IO tends to be one of the more expensive operations we can do, so its best to limit it as much as we can.

Also note that threads in python don't take advantage of multiple core, cause of the GIL instead use multiprocessing if you want to leverage multiple processing engines.

here is a quick example:

from multiprocessing import Process, Queue

def test_1(q):
    for i in range(10):
        q.put('test_1: ' + str(i))

def test_2(q):
    for i in range(10):
        q.put('test_2: ' + str(i))

q = Queue()
p1 = Process(target=test_1, args=(q,))
p2 = Process(target=test_2, args=(q,))
p1.start()
p2.start()

with open('test.txt', 'w') as file:
    while p1.is_alive() or p2.is_alive() or not q.empty():
        try:
            value = q.get(timeout = 1)
            file.write(value + '\n')
        except Exception as qe:
            print "Empty Queue or dead process"
p1.join()
p2.join()

and the contents of test.txt:

test_1: 0
test_1: 1
test_1: 2
test_1: 3
test_1: 4
test_2: 0
test_1: 5
test_2: 1
test_1: 6
test_2: 2
test_1: 7
test_2: 3
test_1: 8
test_2: 4
test_1: 9
test_2: 5
test_2: 6
test_2: 7
test_2: 8
test_2: 9
Samy Vilar
  • 10,800
  • 2
  • 39
  • 34
  • Thank you! Very detailed answer, I am going to give this a try now – nick Sep 14 '12 at 06:16
  • @nick glad I could be of help, check out http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers it tends to be the eassiest way to do multiprocessing in python following a map-reduce pattern, gl. – Samy Vilar Sep 14 '12 at 06:18
0

One way is to lock the file, so that only one thread can access it at a time; check threading.Lock for that.

houbysoft
  • 32,532
  • 24
  • 103
  • 156