4

I have code that reads data from 7 devices every second for an infinite amount of time. Each loop, a thread is created which starts 7 processes. After each process is done the program waits 1 second and starts again. Here is a snippet the code:

def all_thread(): #function that handels the threading
    thread = threading.Thread(target=all_process) #prepares a thread for the devices
    thread.start() #starts a thread for the devices

def all_process(): #function that prepares and runs processes
    processes = [] #empty list for the processes to be stored
    while len(gas_list) > 0: #this gaslist holds the connection information for my devices
        for sen in gas_list: #for each sen(sensor) in the gas list
            proc = multiprocessing.Process(target=main_reader, args=(sen, q)) #declaring a process variable that sends the gas object, value and queue information to reading function
            processes.append(proc) #adding the process to the processes list
            proc.start() #start the process
        for sen in processes: #for each sensor in the processes list
            sen.join() #wait for all the processes to complete before starting again
        time.sleep(1) #wait one second

However, this uses 100% of my CPU. Is this by design of threading and multiprocessing or just bad coding? Is there a way I can limit the CPU usage? Thanks!

Update:

The comments were mentioning the main_reader() function so I will put it into the question. All it does is read each device, takes all the data and appends it to a list. Then the list is put into a queue to be displayed in the tkinter GUI.

def main_reader(data, q): #this function reads the device which takes less than a second
    output_list = get_registry(data) #this function takes the device information, reads the registry and returns a list of data
    q.put(output_list) #put the output list into the queue
GreenSaber
  • 1,118
  • 2
  • 26
  • 53
  • Starting a process in Python means launching a new instance of the Python interpreter, which means in your case you have 7 separate interpreters running on your 4-8 cores (I assume). When a new interpreter gets spawned it inherits a bunch of resources from the parent process in order to be able to do it's job, which means spawning a new process can be pretty slow and expensive. Your processes are running a function called `main_reader()`, based on the name I assume the work that you're doing is mainly IO stuff. Have you tried starting a bunch of daemons instead of all those processes? – orangeInk Aug 09 '17 at 12:51
  • @orangeInk ya it's all IO. `main_reader()` reads the device and prepares the data to be displayed. I haven't played with daemons at all. Right now, I'm using multiprocessing to keep the tkinter GUI from freezing while the devices are doing their work. Will daemons achieve this as well? – GreenSaber Aug 09 '17 at 12:54
  • From this it is impossible to see what happens in main_reader. That is where your processing happens. All_process waits in join(). Just one small, probably unrelated thing, you probably want to move `processes = []` as the first line of your while loop. Now that list keeps growing as the already joined processes are not removed from it. You just append to it iteration after iteration. – Hannu Aug 09 '17 at 12:54
  • You could also investigate multiprocessing.Pool. It launches the subprocesses once and you can keep feeding tasks to them. You do not need to join them and shut them down if your intention is just to give the worker pool more work to do in the very near future. This would reduce your process creation overhead, especially useful if main_reader is expected to complete in a very short time. – Hannu Aug 09 '17 at 12:57
  • @Hannu That sounds like what I need! `main_reader()` takes a fraction of a second to complete and happens every second or so. Do you mind writing up an answer including the pool and I can see if it works? – GreenSaber Aug 09 '17 at 13:01
  • I don't have much experience with GUIs but I don't see why threads would block the main loop. A quick search revealed this: https://stackoverflow.com/questions/37221105/python-tkinter-how-can-i-prevent-tkinter-gui-mainloop-crash-using-threading The answer suggests that threading works fine. – orangeInk Aug 09 '17 at 13:02
  • @orangeInk I don't know if it's because of my physical devices or not but when I try to accomplish the same task with only using threading, it blocks my mainloop. I will look more into it however. Thanks! – GreenSaber Aug 09 '17 at 13:05
  • If it only takes a fraction of a second to run `main_reader()`, moving `processes = []` might actually help. If you do not do this and let your processes list grow, it will grow fast and your loop to join processes will first attempt to join thousands or tens of thousands of processes not running at all, until finally hitting the newest few in the list actually running. – Hannu Aug 09 '17 at 14:21
  • @Hannu I just moved it to the first line of the while loop and my CPU still goes to 100% immediately. – GreenSaber Aug 09 '17 at 14:24
  • If `get_registry` uses the python interpreter, you are possibly stumbling against the [Global Interpreter Lock](https://stackoverflow.com/questions/1294382/what-is-a-global-interpreter-lock-gil). In the standard python interpreter, the GIL is so onerous that it's often recommended you run new processes instead of running threads. The link I provided gives some description of GIL, and provides a lot of other resources for further reading. – Scott Mermelstein Aug 09 '17 at 14:48
  • @Evan If you found Hannu's answer useful, you can upvote it as well as accept it. They're two separate ways to help the person who helped you get more reputation. See the [help center](https://stackoverflow.com/help/someone-answers) for information on everything you can do when someone answers. – Scott Mermelstein Aug 09 '17 at 17:52

1 Answers1

2

As you state in the comments, your main_reader takes only a fraction of a second to run, which means process creation overhead might cause your problem.

Here is an example with multiprocessing.Pool. This creates a pool of workers and submits your tasks to them. Processes are started only once and never shut down or joined if this is meant to be an infinite loop. If you want to shut your pool down, you can do so by joining and closing it (see documentation for that).

from multiprocessing import Pool, Manager
from time import sleep
import threading
from random import random

gas_list = [1,2,3,4,5,6,7,8,9,10]

def main_reader(sen, rqu):
    output = "%d/%f" % (sen, random())
    rqu.put(output)


def all_processes(rq):
    p = Pool(len(gas_list) + 1)
    while True:
        for sen in gas_list:
            p.apply_async(main_reader, args=(sen, rq))

        sleep(1)

m = Manager()
q = m.Queue()
t = threading.Thread(target=all_processes, args=(q,))
t.daemon = True
t.start()

while True:
    r = q.get()
    print r

If this does not help, you need to start digging deeper. I would first increase the sleep in your infinite loop to 10 seconds or even longer. This would allow you to monitor the behaviour of your program. If CPU peaks for a moment and then settles down for 10 seconds or so, you know the problem is in your main_reader. If it is still 100%, your problem must be elsewhere.

Is it possible your problem is not in this part of your program at all? You seem to launch this all in a thread, which indicates your main program is doing something else. Can it be this something else that peaks the CPU?

Hannu
  • 11,685
  • 4
  • 35
  • 51
  • I'm going to test this right now and I will get back to you. I launch a thread to separate the mainloop of Tkinter and the work I need to do with the devices. – GreenSaber Aug 09 '17 at 14:36
  • You could then try it without tkinter loop. Just don't start it and wait in sleep(bigtime) just after launching this part of your program. Your application doesn't do then much but if it then behaves nicely despite running all these background tasks, you then at least know your problem is not in this part of your code. – Hannu Aug 09 '17 at 14:39
  • I can see the pool attempting to do work, however nothing is getting to `main_reader()`. – GreenSaber Aug 09 '17 at 14:55
  • What do you mean by that? main_reader does not run? It runs but does not finish? There is nothing in the queue? (you are using Queue from multiprocessing, right?) – Hannu Aug 09 '17 at 14:57
  • I am using Queue from multiprocessing. More specifically, `main_reader()` is not running at all. The pool is looping but does not run any of the `main_reader` code. – GreenSaber Aug 09 '17 at 15:00
  • Sorry, It has been a while since I have been working with this. See the updated answer (using Manager.Queue instead of multiprocessing.Queue). It seems to be working now. – Hannu Aug 09 '17 at 15:05
  • Alright I've mostly got it working, the only problem I am having now is that not every item in the `gas_list` seems to be making it to `main_reader` after every `sleep`. – GreenSaber Aug 09 '17 at 15:30
  • You could increase the size of your pool to have a couple of spares there. This also works a bit differently as you do not wait for your subprocesses to complete before entering sleep(). If some of them are still running, you will just start another round nevertheless. In your previous attempt you waited for all of them to complete, whether it takes 0,1 or 5 seconds, then sleep a second and try again. This queues tasks, doesn't care how long it takes to process them, sleeps a second and queues another bunch of tasks. – Hannu Aug 09 '17 at 15:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151512/discussion-between-evan-and-hannu). – GreenSaber Aug 09 '17 at 15:35
  • Just one more thing. If you implement the lock method from chat, remember to encapsulate your rlock.release() with try/except, just in case two almost parallel processes are simultaneously comparing lco.value to 0 and trying to release the lock. In theory this could happen, and trying to release a lock that has not been acquired raises an exception. Catch it and ignore it, it is not an error in this case. – Hannu Aug 09 '17 at 16:26