0

I have a system designed to take data via a socket and store that into a dictionary to serve as a database. Then all my other modules (GUI, analysis, write_to_log_file, etc) will access the database and do what they need to do with the dictionary e.g make widgets/copy the dictionary to a log file. But since all these things happen at a different rate, I chose to have each module on their own thread so I can control the frequency.

In the main run function there's something like this:

    from threading import Thread
    import data_collector
    import write_to_log_file

    def main():
        db = {}
        receive_data_thread = Thread(target=data_collector.main, arg=(db,))
        recieve_data_thread.start()  # writes to dictionary  @ 50 Hz
        log_data_thread = Thread(target=write_to_log_file.main, arg(db,))
        log_data_thread.start()  # reads dictionary  @ 1 Hz

But it seems that both modules aren't working on the same dictionary instance because the log_data_thread just prints out the empty dictionary even when the data_collector shows the data it's inserted into the dictionary.

There's only one writer to the dictionary so I don't have to worry about threads stepping on each others toes, I just need to figure out a way for all the modules to read the current database as it's being written.

Ned U
  • 401
  • 2
  • 7
  • 15
  • If you have only one writer but multiple readers, you can in fact have race conditions. If one threads reads the dictionary when the writer is only halfway through modifying it (extremely possible) then your program will fail in some weird ways. – TheSoundDefense Aug 20 '14 at 19:55

4 Answers4

2

Rather than using a builtin dict, you could look at using a Manager object from the multiprocessing library:

from multiprocessing import Manager
from threading import Thread
from time import sleep

manager = Manager()
d = manager.dict()

def do_this(d):
    d["this"] = "done"

def do_that(d):
    d["that"] ="done"

thread0 = Thread(target=do_this,args=(d,))
thread1 = Thread(target=do_that,args=(d,))
thread0.start()
thread1.start()
thread0.join()
thread1.join()

print d

This gives you a standard-library thread-safe synchronised dictionary which should be easy to swap in to your current implementation without changing the design.

ebarr
  • 7,704
  • 1
  • 29
  • 40
0

Use a Queue.Queue to pass values from the reader threads to a single writer thread. Pass the Queue instance to each data_collector.main function. They can all call the Queue's put method.

Meanwhile the write_to_log_file.main should also be passed the same Queue instance, and it can call the Queue's get method. As items are pulled out of the Queue, they can be added to the dict.

See also: Alex Martelli, on why Queue.Queue is the secret sauce of CPython multithreading.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
0

This should not be a problem. I also assume you are using the threading module. I would have to know more about what the data_collector and write_to_log_file are doing to figure out why they are not working.

You could technically even have more then 1 thread writing and it would not be a problem because the GIL would take care of all the locking needed. Granted you will never get more then one cpus worth of work out of it.

Here is a simple Example:

import threading, time

def addItem(d):
  c = 0
  while True:
    d[c]="test-%d"%(c)
    c+=1
    time.sleep(1)

def checkItems(d):
  clen = len(d)
  while True:
    if clen < len(d):
      print "dict changed", d
      clen = len(d)
    time.sleep(.5)

DICT = {}
t1 = threading.Thread(target=addItem, args=(DICT,))
t1.daemon = True
t2 = threading.Thread(target=checkItems, args=(DICT,))
t2.daemon = True
t1.start()
t2.start()

while True:
  time.sleep(1000)
  • While individual bytecodes might be considered atomic in the sense that they cannot be interrupted, updating a dict is not. Adding a key to a dict or updating an entry is 4 to 6 bytecodes. Otherwise the threading module would not need all the locking primitives it has. – Roland Smith Aug 20 '14 at 20:53
  • updating a dict is not atomic for sure, but a dict can be updated by multi threads with out a problem because of the GIL. The only time I know you need to kind of be careful is when you are iterating over the dict on one thread and modifying it with another thread. – Luke Wahlmeier Aug 20 '14 at 21:13
0

Sorry, I figured out my problem, and I'm dumb. The modules were working on the same dictionary, but my logger wasn't wrapped around a while True so it just executed once and terminated the thread and thus my dictionary was only logged to disk once. So I made write_to_log_file.main(db) constantly write at 1Hz forever and set log_data_thread.deamon = True so that once the writer thread (which won't be a daemon thread) exits, it'll quit. Thanks for all the input about best practices on this type of system.

Ned U
  • 401
  • 2
  • 7
  • 15