2

I'm trying to use map, to map the ping_all function to a list of hosts. The problem I'm having is that inside the ping_all function, I'm trying to append all failed hosts to a list. Normally I would call the ping_all function, passing in the empty list as an argument and returning the modified list, but since I'm using map here, I'm not sure how to achieve that?

import os
import argparse
from subprocess import check_output
from multiprocessing import Pool


parser = argparse.ArgumentParser(description='test')

args = parser.parse_args()

dead_hosts = []


def gather_hosts():
    """ Returns all environments from opsnode and puts them in a dict """
    host_list = []
    url = 'http://test.com/hosts.json'
    opsnode = requests.get(url)
    content = json.loads(opsnode.text)
    for server in content["host"]:
        if server.startswith("ip-10-12") and server.endswith(".va.test.com"):
            host_list.append(str(server))
    return host_list


def try_ping(hostnames):
    try:
        hoststatus = check_output(["ping", "-c 1", hostnames])
        print "Success:", hostnames
    except:
        print "\033[1;31mPing Failed:\033[1;m", hostnames
        global dead_hosts
        dead_hosts.append(hostnames)

def show_dead_hosts(dead_hosts):
    print '\033[1;31m******************* Following Hosts are Unreachable ******************* \n\n\033[1;m'
    for i in dead_hosts:
        print '\033[1;31m{0} \033[1;m'.format(i)

if __name__ == '__main__':
    hostnames = gather_hosts()
    pool = Pool(processes=30)              # process per core
    pool.map(try_ping, hostnames, dead_hosts)
    show_dead_hosts(dead_hosts)

I tried passing dead_hosts as a second argument in map, but after running this script, dead_hosts remains an empty list, it does not appear that the hosts are appending to the list.

What am I doing wrong?

david
  • 6,303
  • 16
  • 54
  • 91

2 Answers2

1

There are several issues with your code:

  1. The third argument to Pool.map is the chunksize, so passing dead_hosts (a list) is definitely incorrect.
  2. You can't access globals when using a multiprocessing Pool because the tasks in the pool run in separate processes. See Python multiprocessing global variable updates not returned to parent for more details.
  3. Related to the previous point, Pool.map should return a result list (since global side-effects will be mostly invisible). Right now you're just calling it and throwing away the result.
  4. Your format codes weren't properly clearing in my terminal, so everything was turning bold+red...

Here's a version that I've updated and tested—I think it does what you want:

import os
import argparse
from subprocess import check_output
from multiprocessing import Pool


parser = argparse.ArgumentParser(description='test')

args = parser.parse_args()

def gather_hosts():
    """ Returns all environments from opsnode and puts them in a dict """
    host_list = []
    url = 'http://test.com/hosts.json'
    opsnode = requests.get(url)
    content = json.loads(opsnode.text)
    for server in content["host"]:
        if server.startswith("ip-10-12") and server.endswith(".va.test.com"):
            host_list.append(str(server))
    return host_list


def try_ping(host):
    try:
        hoststatus = check_output(["ping", "-c 1", "-t 1", host])
        print "Success:", host
        return None
    except:
        print "\033[1;31mPing Failed:\033[0m", host
        return host

def show_dead_hosts(dead_hosts):
    print '\033[1;31m******************* Following Hosts are Unreachable ******************* \n\n\033[0m'
    for x in dead_hosts:
        print '\033[1;31m{0} \033[0m'.format(x)

def main():
    hostnames = gather_hosts()
    pool = Pool(processes=30)              # process per core
    identity = lambda x: x
    dead_hosts = filter(identity, pool.map(try_ping, hostnames))
    show_dead_hosts(dead_hosts)

if __name__ == '__main__':
    main()

The main change that I've made is that try_ping either returns None on success, or the host's name on failure. The pings are done in parallel by your task pool, and the results are aggregated into a new list. I run a filter over the list to get rid of all of the None values (None is "falsey" in Python), leaving only the hostnames that failed the ping test.

You'll probably want to get rid of the print statements in try_ping. I'm assuming you just had those for debugging.

You could also consider using imap and ifilter if you need more asynchrony.

Community
  • 1
  • 1
DaoWen
  • 32,589
  • 6
  • 74
  • 101
  • I just thought that was more readable than putting the identity lambda directly in the filter call. You can of course inline it if you prefer. – DaoWen Apr 11 '16 at 16:30
  • but I mean, I don't understand why this line is necessary – david Apr 11 '16 at 16:44
  • because I could just do like 'dead_hosts = pool.map(try_ping, hostnames)', without filter and identity? – david Apr 11 '16 at 16:45
  • Nevermind, I see that without this line, the list fills up with None (just like you said) – david Apr 11 '16 at 16:58
  • @david - Yes, the filter is to get rid of the `None` entries. There are several other ways that you could accomplish the same thing. Returning `None` and then filtering with [the identity function](https://en.wikipedia.org/wiki/Identity_function) just seemed the most intuitive to me. Anyway, I'm glad it makes sense now! – DaoWen Apr 11 '16 at 18:22
  • Also, just FYI, what you're doing here is really just *[MapReduce](https://en.wikipedia.org/wiki/MapReduce)*. Thinking about the problem in terms of MapReduce instead of serial for-loops might help you better conceptualize this. – DaoWen Apr 11 '16 at 18:33
0

Your try_ping function doesn't actually return anything. If I were you, I wouldn't bother with having dead_hosts outside of the function but inside the try_ping function. And then you should return that list.

I'm not familiar with the modules you're using so I don't know if pool.map can take lists.

TerryA
  • 58,805
  • 11
  • 114
  • 143