0

I have a Python application where thread1 calls an API to see 'what reports are ready to download' and sends that report_id to thread2 which 'downloads/processes those reports. I am trying to determine what happens if thread1 is adding items to the dict if thread2 is iterating over it. Right now I do a copy before working in thread2

Two questions

  1. Can I iterate over a changing dictionary? I currently a) make a copy of dict before I iterate, b) iterate over a copy of the dict, c) for items that are 'processed' by the loop on the copy of the dict I delete the key from the 'original' dict so on the next loop it doesn't reprocess the same item

  2. If I can't iterate over a changing dict do I need to use a lock to make a copy like I am doing below. Is that the right way to do it?

lock = threading.Lock()
while True:

        with lock:  #copy dict to prevent contenion
            reports_to_call_copy = self.reports_to_call.copy()

        for line in reports_to_call_copy:
              #do stuff and delete items from original dict so on next loop it doesn't process twice. 


        is_killed = self._kill.wait(180)
        if is_killed:
            print("Killing - Request Report")
            break
              del self.reports_to_call[user, report_name]
David Buck
  • 3,752
  • 35
  • 31
  • 35
personalt
  • 810
  • 3
  • 13
  • 26

1 Answers1

1
  1. No, it is not possible to iterate a changing dictionary, even when changed from the same thread. Simplest reproducible example:
>>> d = dict()
>>> d['a'] = 10
>>> for k, v in d.items():
...     del d['a']
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
  1. Any code that manipulates the dictionary (such as the del statements below) must acquire the lock as well, otherwise, it may delete items at any time while your thread is creating the deep copy, potentially causing the same trouble, particularly if the dictionary size is large enough that the deep copy could be interrupted by the deletion process.
Marc Sances
  • 2,402
  • 1
  • 19
  • 34
  • Thanks this was helpful. I do wonder though if those del calls need the lock. assuming self.reports_to_call.copy() is also considered an interaction, that is the only time I iterate over reports_to_call. the main loop iterates over a copy of reports_to_call that is named simply reports_to_call_copy. the .copy() is never being run while a delete is happening. That being said, I add reports to reports_to_call from another thread. That add could take place as same time as reports_to_call_copy = self.reports_to_call.copy is happening – personalt Feb 26 '21 at 05:26
  • Are you sure of that? What if the same method (your whole code snippet) is called twice in a row, the first call acquires the lock, the other call is starving for the lock, and then when the first call releases the lock (continues execution... deletes entries) the second call acquires the lock and does the deep copy? You have a potential conflict there. – Marc Sances Feb 26 '21 at 07:21
  • santos - My code snippet acts is its own thread that is called only once at startup. So technically I would be okay, or at least I think so. But as I did more research a lot of best practices seem to recommend using locks even in places where you think you might not need them. And based on that will take the advice to having all the code that modifies these same dictionaries include a lock – personalt Feb 26 '21 at 16:12