2

i have made a python script that performs a nagios check. The functionality of the script is pretty simple it just parses a log and matches some info witch is used to construct the nagios check output. The log is a snmptrapd log witch records the traps from other servers and logs them in /var/log/snmptrapd after witch i just parse them with the script. In order to have the latest traps i erase the log from python each time after reading it. In order to preserve the info i have made a cron job that copies the content of the log into another log at an time interval a bit smaller than the nagios check interval. The thing that i don't understand is why is the log growing so much (i mean the messages log which has i guess 1000 times more info is smaller). From what i've seen in the log there are a lot of special characters like ^@ and i think that this is done by the way i'm manipulating the file from pyton but seeing that i olny have like three weeks of experience with it I can't seem to figure out the problem.

The script code is the following:

import sys, os, re

validstring = "OK"
filename = "/var/log/snmptrapd.log"

if os.stat(filename)[6] == 0:
        print validstring
        sys.exit()

else:
        f = open(filename,"r")
        sharestring = ""
        line1 = []
        patte0 = re.compile("[0-9]+-[0-9]+-[0-9]+")
        patte2 = re.compile("NG: [a-zA-Z\s=0-9]+.*")
        for line in f:
                line1 = line.split(" ")
                if re.search(patte0,line1[0]):
                        sharestring = sharestring + line1[1] + " "
                        continue
                result2 = re.search(patte2,line)
                if result2:
                        result22 = result2.group()
                        result22 = result22.replace("NG:","")
                        sharestring = sharestring + result22 + " "
        f.close()
        f1 = open(filename,"w")
        f1.close()
        print sharestring
        sys.exit(2)

~

The log looks like:

2012-07-11 04:17:16 Some IP(via UDP: [this is an ip]:port) TRAP, SNMP v1, community somestring
    SNMPv2-SMI::enterprises.OID Some info which is not necesarry
    SNMPv2-MIB::sysDescrOID = STRING: info which i'm matching

I'm pretty sure that it has something to do with the my way of erasing the file but i can't figure it out. If you have some idea i would be really interested. Thank you.

As an information about the size i have 93 lines(so says Vim) and the log occupies 161K and that is not ok because the lines are quite short.

OK it has nothing to do with the way i read and erased the file. Is something in the snmptrapd daemon that is doing this when i'm erasing it's log file. I have modified my code and now i send SIGSTOP to snmptrapd reight before i open the file, and i make my modifications to the file and then i send SIGCONT after i'm done but it seem i experience the same behavior. The new code looks like(the different parts):

else:
    command = "pidof snmptrapd"
    p=subprocess.Popen(shlex.split(command),stdout=subprocess.PIPE)
    pidstring = p.stdout.readline()
    patte1 = re.compile("[0-9]+")
    pidnr = re.search(patte1,pidstring)
    pid = pidnr.group()
    os.kill(int(pid), SIGSTOP)
    time.sleep(0.5)
    f = open(filename,"r+")
    sharestring = ""

and

                  sharestring = sharestring + result22 + " "
    f.truncate(0)
    f.close()
    time.sleep(0.5)
    os.kill(int(pid), SIGCONT)
    print sharestring

I'm thinking of stopping the daemon erasing the file and after that recreating it with the proper permissions and starting the daemon.

primero
  • 591
  • 1
  • 6
  • 17
  • @Jarrod That will replace the file with an empty one – GP89 Jul 11 '12 at 08:42
  • I got the ideea from here http://mail.python.org/pipermail/tutor/2010-February/074323.html . I didn't express my self correctly,I have to clear the content of the file not exactly to delete it, because if i delete it i have to restart the daemon and that it's a rather complicated solution if there is a way to do it without the delete option. – primero Jul 11 '12 at 08:46
  • @primero what `daemon` are you talking about? we don't know what you know. provide more details than you think you need to. –  Jul 11 '12 at 08:48
  • @Jarrod Sorry for that. I have to restart the snmptrapd daemon which has an option for the log file. From what i've seen when i erase the snmpdtrapd.log file the daemon has to be restarted in order to know in what file to write. – primero Jul 11 '12 at 08:51
  • If something writing in the file, and you overwrite it by `open(..., O_TRUNC)`, followed by a `close`, I think you will cause the writers to create a file with holes. This would explain the ^@ (usual zero byte presentation), as zero bytes are how the holes (unallocated storage) are presented to userspace when reading. If this is the case, I afraid, unless the other processes, which are writing the logfile cooperate (by allowing to send them a SIGHUP, for instance, to reopen the log file) you cannot avoid the space shortage. – fork0 Jul 11 '12 at 10:39

1 Answers1

1

I don't think you can, but here are some things to try

Truncating a File

f1 = open(filename, 'w')
f1.close()

is a hacky side effect way of deleting a files contents and will probably be causing undesired side effects depending on the underlying OS if other applications have that file open.

Using the File Object method truncate()

truncate([size])

Truncate the file's size. If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position. The current file position is not changed. Note that if a specified size exceeds the file's current size, the result is platform-dependent: possibilities include that the file may remain unchanged, increase to the specified size as if zero-filled, or increase to the specified size with undefined new content. Availability: Windows, many Unix variants.

Probably the only determinist way to do this is

stop the snmptrapd process at the start of the script, use the proper os module function remove and then recreate the file and restart the snmptrapd daemon at the end of the script.

os.remove(path)

Remove (delete) the file path. If path is a directory, OSError is raised; see rmdir() below to remove a directory. This is identical to the unlink() function documented below. On Windows, attempting to remove a file that is in use causes an exception to be raised; on Unix, the directory entry is removed but the storage allocated to the file is not made available until the original file is no longer in use.

Shared resource concern

You still might have problems with having two processes trying to fight for writing to a single file without some kind of locking mechanism and having non-deterministic things happening to the file. I bet you can send a SIGINT or something similar to your daemon process and get it to re-read the file or something, check your documentation.

Manipulating shared resources, especially file resources without exclusive locking is going to be trouble, especially with filesystem caching and application caching of data.

  • Removing the file does not seem to be an option. Instead, @primero wants to only delete the content of the file. – Rodrigue Jul 11 '12 at 08:52
  • 1
    So i could use f.truncate([0]) to erase the content of the file without deleting it? – primero Jul 11 '12 at 08:54
  • K i tested with the command line interpreter and f.truncate(0) seems to do the trick. I'll try it in the script and let you know how it works. Thanks for the info. – primero Jul 11 '12 at 08:58
  • It's the same thing. I have the first line which occupies two monitors full of `^@` characters and they use 40K of space – primero Jul 11 '12 at 09:05
  • I'm thinking of doing something like reading every line and for every line to write back "" – primero Jul 11 '12 at 09:06
  • I'm not familiar with the work with process signals, but shouldn't i try to send SIGSTOP wait for a very little bit and than send SIGCONT so as to just pause the process and then resume it? – primero Jul 11 '12 at 09:23
  • @primero it depends on if `snmptrapd` even responds to any of those signals and what it does if it does respond to them –  Jul 11 '12 at 17:08
  • @Jarrod the daemon didn't respond to the signals, and the script is run by the nagios daemon so it has the nagios permissions so things like sending signals or shutting down a daemon are not possible. The snmptrapd daemon has access to the port udp:162 and that is a restricted port witch only the root has access to it. I devised an other method because this seems to have been a dead end. – primero Jul 12 '12 at 11:06