1

I have created python script with multithreads, each thread write value to global dict which is thread safe because each thread update the dictionary with new unique value that didn't exist before, I want each thread to save the results of the dict in the output file, but i receive "dictionary changed size during iteration", is there a way to do that like locking the dictionary for writing during dumping to a file, i tried to lock and release but didn't work

def do_function():
    while True:
         r=q.get()
         global_dict[r]={}  --> this is thread safe as r is unique it will not repeat again
         telephone,address=get_info(r)
         global_dict[r]['t']=telephone 
         global_dict[r]['a']=address

         with open("output.pickle","wb") as j:   --> save to file
              pickle.dump(global_dict,j) --> receive error dictionary changed size during iteration

         q.task_done()

global dict={}
thread=10
q = Queue(threads * 2)
for i in range(concurrent):
    t = Thread(target=do_function)
    t.daemon = True
    t.start()
for p in lst:
        q.put(p)
    q.join()
Amr
  • 2,045
  • 14
  • 17
  • The two lines of code you provided are invalid Python, and they also don't form a [mcve]. Show us what didn't work. – Arya McCarthy Aug 25 '17 at 00:24
  • Duplicate? https://stackoverflow.com/questions/1312331/using-a-global-dictionary-with-threads-in-python – Alexander Aug 25 '17 at 00:25
  • not duplicate, I have seen this, it talks about which operation in dictionary are thread safe and which one you should lock and release – Amr Aug 25 '17 at 00:41

1 Answers1

0

You do not need to do the write dict to file in threading. And maybe it is an error. Because it is a global dict. You could do it when all threads have done, just move the

with open("output.pickle","wb") as j:  
    pickle.dump(global_dict,j)

to the end of file.

Your error is caused by when a thread was dumping the dict to file, and another thread changed the dict, So the first thread will complain that dictionary changed size during iteration

EDITED 1

I Think the easy solution is do not use a global variable, then the error will not happen. like this:

import threading
lock = threading.Lock()

def do_function():
    while True:
         r=q.get()
         d={}  
         telephone,address=get_info(r)
         d['t']=telephone 
         d['a']=address
         lock.acquire()
         with open("output.pickle","ab") as j:  
              pickle.dump(d,j) 
         lock.release()
         q.task_done()

And notice that use "ab" mode to open the file for appending not replacing, do not use the "wb".

EDITED 2

Using the lock each time write to file may have heavy cost. A work around method is each thread write to a different file which could be named by an uuid generated when get into this thread.

And a faster method is you could do a batch write and use lock when write. it will be faster than older mehod very much.

sample code:

import threading
lock = threading.Lock()

def do_function():
    buffer = []
    while True:
         r=q.get()
         d={}  
         telephone,address=get_info(r)
         d['t']=telephone 
         d['a']=address
         buffer.append(d)
         q.task_done()

         if len(buffer) >= BATCH_COUNT:
             lock.acquire()
             with open("output.pickle","ab") as j:  
                  pickle.dump(buffer,j) 
             lock.release()
             buffer = []

The BATCH_COUNT could be 1000 or 10000 or something you like.

GuangshengZuo
  • 4,447
  • 21
  • 27
  • yes I understand, I want to write it to a file continually not just after all threads are done so if the program broke up it will not need to repeat from the beginning, it will continue from where it left, is there a way to do this? – Amr Aug 25 '17 at 09:55
  • all thread writing to a file at same time, i think this will make the file unreadable and content will overlap, or writing to file is thread safe? – Amr Aug 25 '17 at 15:54
  • Sorry, I just copy your code and did not think about it, answer updated. – GuangshengZuo Aug 25 '17 at 16:09
  • I have tried to use lock and release, but it sill gives same error dictionary changed size during iteration, as if lock.acquire() do nothing – Amr Aug 25 '17 at 22:21
  • yes. do not use the global dict. use local variable. – GuangshengZuo Aug 25 '17 at 22:23