3

I will try to keep what I am trying to do as simple as possible.

I have two classes ClassA and ClassB

ClassA has an instance method that contains a while loop that runs "infinitely" and collects data. ClassA is also passed an instance of ClassB. While ClassA collects this data, it is also checking the data that comes in to see if a certain signal has been received. If the signal has been received, an instance method in ClassB is called upon.

Consider the following main program driver:

from class_a import ClassA
from class_b import ClassB

database_connection = MongoDB #purely example
class_b = ClassB(database_connection)
class_a = ClassA(class_b)

And then the classes:

Class class_a:

    def __init__(self, class_b):
        self.class_b

    def collect_data(self):
        while True:
            data = receiver()
            if (signal in data):
                self.class_b.send_data_to_database(data)

Class class_b:

    def __init__(self, database):
        self.database = database

    def convert_data(self, data):
        return data + 1

    def send_data_to_database(data):
        converted_data = convert_data(data)
        self.database.send(converted_data)

Now here is my question. Should I have a thread for the "send_data_to_database()" instance method in Class B? My thought process is that possibly spawning a thread just to deal with sending data to a database, will be faster THAN the instance method NOT being threaded. Is my thinking wrong here? My knowledge of threading is limited. Ultimately, I am just trying to find the fastest way to send data to the database upon Class A recognizing that there is a signal in the data. Thanks to all of those who reply in advance.

Kyle DeGennaro
  • 188
  • 3
  • 12
  • 2
    Threads imply concurrency - i.e. multiple actions at once. Your code is purely sequential, with one action after another: ``... -> receive -> check -> send -> receive -> ...``. Offloading a *single* action to a thread, e.g. send, is generally not worth it - starting the thread takes longer than just doing the action directly. – MisterMiyagi Jun 19 '19 at 14:39
  • 1
    What becomes of collected data where the signal is not in the data? Does class A sleep between data collection runs, or does he just crank as fast as he can? Is there a realistic risk that he falls behind, or can he just take his own sweet time collecting data? What is the rest of the app doing besides this data collection piece? Or is this it? – bigh_29 Jun 19 '19 at 14:41
  • @bigh_29 Data that does not have the signal in it, is omitted. `ClassA` does not sleep between data collection runs. To keep things simple, this is pretty much the app (besides the data being processed). There is no significant risk to data collection falling behind; My main concern is being able to send the data as fast as possible upon receiving that signal. – Kyle DeGennaro Jun 19 '19 at 14:46
  • @MisterMiyagi I have just found this: https://stackoverflow.com/questions/10154487/how-to-get-a-faster-speed-when-using-multi-threading-in-python and have gave it a quick glance. Could multi-processing be perhaps a better (if even needed) solution to threading? – Kyle DeGennaro Jun 19 '19 at 14:47
  • 3
    If there is no risk of data collection falling behind, there is no need for threading here. Certainly not opening a thread and closing it every time you want to write to the database, which would be slower. If the worry were that data collection could fall behind and you want the while loop to continue even when writes are occurring, then I would permanently open a thread with a second while loop monitoring a queue (from standard Python library). Send DB write requests to the queue as they come in and have the second thread handle them while the first thread continues. – Atlas Jun 19 '19 at 14:56
  • 1
    @KyleDeGennaro Processes are even costlier than threads. If you do not have anything to do concurrently, doing things concurrently makes no sense. If you don't know whether you have anything to do concurrently, we cannot tell you either. Ultimately, concurrency is about weighing costs against benefits, and you have defined neither. How long does conversion take? How long does sending take? How long does receiving take? How long can receiving be delayed by sending before it is a problem? Are you CPU or I/O bound? And so on... – MisterMiyagi Jun 19 '19 at 15:03
  • @MisterMiyagi Thank you for your straightforward response; It seems that I do not have concurrent tasks as the program runs sequentially; My concern was trying to speed up sending data when a signal is received. – Kyle DeGennaro Jun 19 '19 at 15:05

1 Answers1

2

I would use threads if either of these are true:

  • The blocking I/O database calls in B can negatively impact A's ability to collect data in a timely manner.
  • These two data collection pieces together can negatively impact the responsiveness of other parts of the app (think unresponsive GUI)

If neither condition is true, then a single threaded app is a lot less hassle.

Consider using a Queue for concurrency if you do use threads. Class A can post data to a Queue that class B is waiting on. Here is a bare bones code example of what I mean:

from queue import Queue
from threading import Thread, Event

class class_a:
    def __init__(self, queue):
        self.queue = queue
        self.thread = Thread(target=self.collect_data)
        self.thread.start()

    def collect_data(self):
        for data in range(1000):
            if data % 3 == 0:
                print(f'Thread A sending {data} to queue')
                self.queue.put(data)
            else:
                print(f'Thread A discarding {data}')

class class_b:
    def __init__(self):
        self.queue = Queue()
        self.thread = Thread(target=self.process_data)
        self.thread.daemon = True
        self.thread.start()

    def process_data(self):
        while True:
            data = self.queue.get()
            print(f'Thread B received {data} from queue')

b = class_b()
a = class_a(b.queue)

Lastly, anytime you think about using parallelism in python, you have to ask whether multiprocessing makes more sense than multithreading. Multiprocessing is a better choice when CPU computation, rather than file or network I/O, becomes the limiting factor in the performance of the app. I don't think multiprocessing is a good fit for your project based on the information you provided.

bigh_29
  • 2,529
  • 26
  • 22
  • I see. Perhaps there must be a loss of time if a queue is implemented? Because now, `ClassA` sends data to the queue, while `ClassB` listens. In contrast to my original example, doesn't that add an extra step from getting the Data from `ClassA` to `ClassB` ? – Kyle DeGennaro Jun 19 '19 at 14:55
  • 1
    I absolutely agree with the recommendation to use a queue. That way, the overhead of starting a thread for every occurrence of the signal in the data is removed. – shmee Jun 19 '19 at 14:58
  • Maybe I have a fault in my explanation; I don't plan to start a thread upon every occurrence of the signal in the data. Simply just a thread to listen for that signal and then send a request to the database via HTTP; Would threading the method that sends the HTTP request, be faster than not? Or is there no significant difference (and perhaps a waste in memory) to do this since everything MUST happen sequentially? – Kyle DeGennaro Jun 19 '19 at 15:00
  • 1
    @KyleDeGennaro, pasted a code example to better explain. Class B spawns a single thread for the life fo the app. A `Queue` is a great way for one thread to send data to another. You might want to encapsulate the queue behind a helper method on B. Depends on whether you want class A do know about an instance of class B, or whether it should know about a shared Q instance. Either way, the queue is handling the communication between the two. – bigh_29 Jun 19 '19 at 15:17
  • Thank you for the detailed response. Very helpful! – Kyle DeGennaro Jun 19 '19 at 15:27