1

What's going on

I'm collecting data from a few thousand network devices every few minutes in Python 2.7.8 via package netsnmp. I'm also using fastsnmpy so that I can access the (more efficient) Net-SNMP command snmpbulkwalk.

I'm trying to cut down how much memory my script uses. I'm running three instances of the same script which sleeps for two minutes before re-querying all devices for data we want. When I created the original script in bash they would use less than 500MB when active simultaneously. As I've converted this over to Python, however, each instance hogs 4GB each which indicates (to me) that my data structures need to be managed more efficiently. Even when idle they're consuming a total of 4GB.


Code Activity

My script begins with creating a list where I open a file and append the hostname of our target devices as separate values. These usually contain 80 to 1200 names.

expand = []
f = open(self.deviceList, 'r')
for line in f:
    line = line.strip()
    expand.append(line)

From there I set up the SNMP sessions and execute the requests

expandsession = SnmpSession ( timeout = 1000000 ,
    retries = 1,            # I slightly modified the original fastsnmpy
    verbose = debug,        # to reduce verbose messages and limit
    oidlist = var,          # the number of attempts to reach devices
    targets = expand,
    community = 'expand'
)
expandresults = expandsession.multiwalk(mode = 'bulkwalk')

Because of how both SNMP packages behave, the device responses are parsed up into lists and stored into one giant data structure. For example,

for output in expandresults:
    print ouput.hostname, output.iid, output.val
#
host1 1 1
host1 2 2
host1 3 3
host2 1 4
host2 2 5
host2 3 6
# Object 'output' itself cannot be printed directly; the value returned from this is obscure
...

I'm having to iterate through each response, combine related data, then output each device's complete response. This is a bit difficult For example,

host1,1,2,3
host2,4,5,6
host3,7,8,9,10,11,12
host4,13,14
host5,15,16,17,18
...

Each device has a varying number of responses. I can't loop through expecting every device having a uniform arbitrary number of values to combine into a string to write out to a CSV.


How I'm handling the data

I believe it is here where I'm consuming a lot of memory but I cannot resolve how to simplify the process while simultaneously removing visited data.

expandarrays = dict()
for output in expandresults:
    if output.val is not None:
        if output.hostname in expandarrays:
            expandarrays[output.hostname] += ',' + output.val
        else:
            expandarrays[output.hostname] = ',' + output.val

for key in expandarrays:
    self.WriteOut(key,expandarrays[key])

Currently I'm creating a new dictionary, checking that the device response is not null, then appending the response value to a string that will be used to write out to the CSV file.

The problem with this is that I'm essentially cloning the existing dictionary, meaning I'm using twice as much system memory. I'd like to remove values that I've visited in expandresults when I move them to expandarrays so that I'm not using so much RAM. Is there an efficient method of doing this? Is there also a better way of reducing the complexity of my code so that it's easier to follow?


The Culprit

Thanks to those who answered. For those in the future that stumble across this thread due to experiencing similar issues: the fastsnmpy package is the culprit behind the large use of system memory. The multiwalk() function creates a thread for each host but does so all at once rather than putting some kind of upper limit. Since each instance of my script would handle up to 1200 devices that meant 1200 threads were instantiated and queued within just a few seconds. Using the bulkwalk() function was slower but still fast enough to suit my needs. The difference between the two was 4GB vs 250MB (of system memory use).

Kamikaze Rusher
  • 271
  • 2
  • 10

3 Answers3

1

If the device responses are in order and are grouped together by host, then you don't need a dictionary, just three lists:

last_host = None
hosts = []                # the list of hosts
host_responses = []       # the list of responses for each host
responses = []
for output in expandresults:
    if output.val is not None:
        if output.hostname != last_host:    # new host
            if last_host:    # only append host_responses after a new host
                host_responses.append(responses)
            hosts.append(output.hostname)
            responses = [output.val]        # start the new list of responses
            last_host = output.hostname
        else:                               # same host, append the response
            responses.append(output.val)
host_responses.append(responses)

for host, responses in zip(hosts, host_responses):
    self.WriteOut(host, ','.join(responses))
Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
  • I'm fairly new to Python and I'm having to learn it the hard way. I haven't seen `zip` used before but I'm safe to assume that it's compressing the stored data in order to reduce the amount of memory that the script is using? – Kamikaze Rusher Jul 07 '15 at 13:32
  • No, it's not a memory compressor or file compactor like the desktop program. Zip joins the lists together into a new list, like a zipper joins both sides of a jacket. https://docs.python.org/3/library/functions.html#zip – Brent Washburne Jul 07 '15 at 13:38
  • Thanks, that explains it perfectly. Using your loop I'm definitely reducing the amount of memory and the data it writes out is correct. However I'm still having a large amount of memory being used by the system which I'll have to investigate further. For now though your answer definitely answers my question since the rest of what I need to investigate likely sits outside the bounds of what I'm asking for help here – Kamikaze Rusher Jul 07 '15 at 14:28
1

The memory consumption was due to instantiation of several workers in an unbound manner.

I've updated fastsnmpy (latest is version 1.2.1 ) and uploaded it to PyPi. You can do a search from PyPi for 'fastsnmpy', or grab it directly from my PyPi page here at FastSNMPy

Just finished updating the docs, and posted them to the project page at fastSNMPy DOCS

What I basically did here is to replace the earlier model of unbound-workers with a process-pool from multiprocessing. This can be passed in as an argument, or defaults to 1.

You now have just 2 methods for simplicity. snmpwalk(processes=n) and snmpbulkwalk(processes=n)

You shouldn't see the memory issue anymore. If you do, please ping me on github.

ajaysdesk
  • 123
  • 5
  • 1
    Thanks! A lot has occurred since I've asked this. We actually modified the `fastsnmpy` package ourselves to act similar to the implementations you've made. (However we've since moved on to `easysnmp` for various reasons.) I'm surprised to see you comment here though. How'd you come across this? – Kamikaze Rusher Jan 25 '16 at 20:32
  • 1
    I had to reimplement this at a much larger scale (roughly querying 8 million objects every minute), committing to a distributed bus, and eventually sharding to a nosql store. Had to start somewhere, so i just googled my old fastsnmpy package, and this was the first post that came up. Glad that someone else used it too. – ajaysdesk Jan 26 '16 at 00:00
  • That's similar to what we're doing, but you're likely doing it on a much larger scale. We're having to query 2,900+ wireless APs for the number of clients connected per radio, parse it for easier search/viewing in Kibana, then send it all to ElasticSearch. We didn't realize that NetSNMP had really bad memory leaks for `SNMPWALK` (~10MB per device we polled) so we had to resort to creating a `multiprocessing.pool` to prevent sucking up all system RAM. Anyways, I'll have to toy around with `fastsnmpy` sometime in the future. – Kamikaze Rusher Jan 27 '16 at 01:05
  • yeah, netsnmp is pretty villainous. I'd suggest doing an occassional walk (like once every few hours), and finding out the indexes that you actually need to poll (like the ones that are ifOper=up, ifName=~/eth*/ etc), and keeping them as your discovery. Then, every minute poll the interesting indexes from the discovery using pure snmpgets (with pdupacking set to reduce chattiness), and that would make it several orders of magnitude faster. – ajaysdesk Jan 27 '16 at 14:37
0

You might have an easier time figuring out where the memory is going by using a profiler:

https://pypi.python.org/pypi/memory_profiler

Additionally, if you're already already tweaking the fastsnmpy classes, you can just change the implementation to do the dictionary based results merging for you instead of letting it construct a gigantic list first.

How long are you hanging on to the session? The result list will grow indefinitely if you reuse it.

pvg
  • 2,673
  • 4
  • 17
  • 31
  • I didn't know I could release sessions. The Python documentation, and the test file included, do not mention a way to do this from within the API. I'm guessing that I would need to `del` each session after storing responses in their results variable? – Kamikaze Rusher Jul 07 '15 at 13:30
  • I meant, are you creating a new session object for each scan or restarting your script, etc. Because looking at the library code, if you keep using the same session object, it hangs on to its result list forever. – pvg Jul 07 '15 at 13:42
  • I used the memory profiler and implemented the loop that Brent provided. Strangely the total amount of memory that my data uses is actually under 33MB. This leads me to believe that either `netsnmp` or `fastsnmpy`, or both, are the culprits here. The Net-SNMP C-language project itself has had a few reported bugs of memory leaks before. Perhaps that's what I'll need to delve into next. (EDIT) I was doing all this in an infinite global loop which created a new session each time. I have it set now to delete the session after results are passed in hopes of it releasing that list and freeing memory – Kamikaze Rusher Jul 07 '15 at 14:21
  • That loop relies on the data being ordered, always, which doesn't seem like a good assumption but I'll let you sort that out. The simplest way to solve your memory problem, since you don't actually seem to have a memory problem is to run the python script from scratch every few minutes. That way you don't have to worry about leaks. If you are creating a new session each scan, the gc will clean up the old ones for you, you shouldn't have to manage them explicitly. – pvg Jul 07 '15 at 14:33
  • If it were possible to run the script from scratch every few minutes I'd certainly do so but this has to run every two minutes on its own since it will be run in three separate instances alongside other SNMP scripts (which are in `bash`). Anyways, yes it seems that *I* don't have a memory problem but the packages do. I'll have to dig deeper in them but all of that seem to lie outside the bounds of my original question so I'll pursue that independent of this thread – Kamikaze Rusher Jul 07 '15 at 14:41
  • Well, you can run it from cron, you can run it from a shell script that does the sleeping, etc, but that's obviously a different question. Restarting simple processes is a pretty standard and sane way to avoid the entire issue of long-term leaks. The memory profiler output will probably help you if you decide to burn time on trying to track this down. – pvg Jul 07 '15 at 14:48
  • Turns out that the large use of system memory comes from the `multiworker` function in `fastsnmpy`, likely from the high number of threads it creates at one time. It looks like it creates ~1000 threads or so. When using just `bulkwalk` the actual amount of system memory is less than 300MB. Thanks for your help and for the package, I'll definitely use this for larger projects – Kamikaze Rusher Jul 07 '15 at 14:53
  • We're looking into setting up Cron on the server these scripts will be hosted on but since it's all been pushed onto me I've had it at the end of my list. Thanks for the reminder, I really need to tap into cron's abilities – Kamikaze Rusher Jul 07 '15 at 15:08