i have made a python script that performs a nagios check. The functionality of the script is pretty simple it just parses a log and matches some info witch is used to construct the nagios check output. The log is a snmptrapd log witch records the traps from other servers and logs them in /var/log/snmptrapd
after witch i just parse them with the script. In order to have the latest traps i erase the log from python each time after reading it. In order to preserve the info i have made a cron job that copies the content of the log into another log at an time interval a bit smaller than the nagios check interval. The thing that i don't understand is why is the log growing so much (i mean the messages log which has i guess 1000 times more info is smaller). From what i've seen in the log there are a lot of special characters like ^@
and i think that this is done by the way i'm manipulating the file from pyton but seeing that i olny have like three weeks of experience with it I can't seem to figure out the problem.
The script code is the following:
import sys, os, re
validstring = "OK"
filename = "/var/log/snmptrapd.log"
if os.stat(filename)[6] == 0:
print validstring
sys.exit()
else:
f = open(filename,"r")
sharestring = ""
line1 = []
patte0 = re.compile("[0-9]+-[0-9]+-[0-9]+")
patte2 = re.compile("NG: [a-zA-Z\s=0-9]+.*")
for line in f:
line1 = line.split(" ")
if re.search(patte0,line1[0]):
sharestring = sharestring + line1[1] + " "
continue
result2 = re.search(patte2,line)
if result2:
result22 = result2.group()
result22 = result22.replace("NG:","")
sharestring = sharestring + result22 + " "
f.close()
f1 = open(filename,"w")
f1.close()
print sharestring
sys.exit(2)
~
The log looks like:
2012-07-11 04:17:16 Some IP(via UDP: [this is an ip]:port) TRAP, SNMP v1, community somestring
SNMPv2-SMI::enterprises.OID Some info which is not necesarry
SNMPv2-MIB::sysDescrOID = STRING: info which i'm matching
I'm pretty sure that it has something to do with the my way of erasing the file but i can't figure it out. If you have some idea i would be really interested. Thank you.
As an information about the size i have 93 lines(so says Vim) and the log occupies 161K and that is not ok because the lines are quite short.
OK it has nothing to do with the way i read and erased the file. Is something in the snmptrapd daemon that is doing this when i'm erasing it's log file. I have modified my code and now i send SIGSTOP to snmptrapd reight before i open the file, and i make my modifications to the file and then i send SIGCONT after i'm done but it seem i experience the same behavior. The new code looks like(the different parts):
else:
command = "pidof snmptrapd"
p=subprocess.Popen(shlex.split(command),stdout=subprocess.PIPE)
pidstring = p.stdout.readline()
patte1 = re.compile("[0-9]+")
pidnr = re.search(patte1,pidstring)
pid = pidnr.group()
os.kill(int(pid), SIGSTOP)
time.sleep(0.5)
f = open(filename,"r+")
sharestring = ""
and
sharestring = sharestring + result22 + " "
f.truncate(0)
f.close()
time.sleep(0.5)
os.kill(int(pid), SIGCONT)
print sharestring
I'm thinking of stopping the daemon erasing the file and after that recreating it with the proper permissions and starting the daemon.