Is it possible to run the following web queries in parallell?

Question

I am using Python with the modbus_tk package to poll n PLCs. Each poll takes ~5 seconds. Is it possible to run these in parallel so that it doesn't take n*5 seconds to get all the data back?

My current code:

for ip in ip_addresses:
    master = modbus_tcp.TcpMaster(host=ip_address)
    my_vals = (master.execute(1, cst.READ_HOLDING_REGISTERS, starting_address=15))
    return my_vals

If either of the answers helped, you should consider accepting one... — wallacer, Jun 17 '14 at 01:01
@wallacer I will, I have tried the code you provided, although with only 4 active ip's that I have, I am getting the same speed on my stopwatch (about 13 seconds). I would like to come back and answer with what solved my issue. — Brian Leach, Jun 17 '14 at 05:15
That's surprising... Yes, don't accept if it didn't solve your issue ;) Have you tried putting some simple timing code and prints in there? Even just a print at the end of the run method for each thread would let you see if they appear to be hitting a bottleneck somewhere and waiting on eachother - which would explain the time being roughly the same. — wallacer, Jun 17 '14 at 18:33

wallacer · Answer 1 · 2014-06-06T23:15:28.773

I don't have knowledge of modbus_tk, but can you just use the threading library? Create 1 thread for each ip address to poll.

Here's some sample code that should get you rolling:

import threading

class Poller( threading.Thread ):
    def __init__( self, ipaddress ):
        self.ipaddress = ipaddress
        self.my_vals = None
        threading.Thread.__init__(self)

    def run( self ):
        master = modbus_tcp.TcpMaster(host=self.ipaddress)
        self.my_vals = (master.execute(1, cst.READ_HOLDING_REGISTERS, starting_address=15))


pollers = []
for ip in ip_addresses:
    thread = Poller(ip)
    pollers.append(thread)
    thread.start()

# wait for all threads to finish, and collect your values
retrieved_vals = []
for thread in pollers:
    thread.join()
    retrieved_vals.append(thread.my_vals)

# retrieved_vals now contains all of your poll results
for val in retrieved_vals:
    print val

Multiprocessing will work here as well, though it's overkill for the problem. Since this is an I/O operation, it's an ideal candidate for threading. The GIL (global interpreter lock) won't slow you down or anything, and threads are lighter weight than processes.

In general a bad idea to generate one thread per ip address. For one - particularly with a 32-bit process - you can easily run out of resources for all those threads and second having thousands of threads will lead to bad performance. But since I doubt any casual user will have thousands of PLCs it's fine in this case. — Voo, Jun 07 '14 at 00:52
@Voo Yes, if your use case involves having thousands of PLCs, you'd probably want to batch multiple ips per Poller thread. However I expect, as you say, that this is not the OP's use case, and so I presented a simple solution that should give him the performance boost he's looking for — wallacer, Jun 07 '14 at 01:33

score 0 · Answer 2 · answered Jun 06 '14 at 22:49

Use multiprocessing.imap_unordered. It lets you start up a pool of processes, send jobs to the pool, and receive results when they come in.

Here's sample code that downloads a bunch of URLs:

import multiprocessing, re, subprocess, sys

CMD_LIST = [
    ["wget", "-qO-", "http://ipecho.net/plain"],
    ["curl", '-s', "http://www.networksecuritytoolkit.org/nst/cgi-bin/ip.cgi"],
    ["curl", '-s', "v4.ident.me"],
    ["curl", '-s', "ipv4.icanhazip.com"],
    ["curl", '-s', "ipv4.ipogre.com"],
]


ip_pat = re.compile('[0-9.]{7,}')
pool = multiprocessing.Pool(5)
for output in pool.imap_unordered(subprocess.check_output, CMD_LIST):
    print 'output:',output
    m = ip_pat.search(output)
    if m:
        print 'GOT IP:', m.group(0)
        pool.terminate()
        sys.exit(0)

print 'no IP found'

why not use threads? On nix systems, processes are cheap, so it's not too much overhead. If the OP happens to be using windows, the overhead will be much higher than threading... — wallacer, Jun 06 '14 at 23:12
@wallacer: good point. For OP's question on multiple I/O bound jobs, threads would indeed be lower overhead, especially on Windows. I recommend `multiprocessing` because it's easier to program. The `Pool` alone makes many things much easier; `threading` is too low-level. — johntellsall, Jun 07 '14 at 00:06

Is it possible to run the following web queries in parallell?

2 Answers2