0

I am new to python and am trying to code my own syslog program. I have tested this on my own machine, and it works fine, I don't notice anything. Now when I put it on my Ubuntu VM it spikes the CPU (reported by 'top' and vSphere) to 80-99%. I have allocated 1 CPU core from my i5 (3.1 GHz) processor. If anything, maybe the file opening and closing is causing this spike, but that just doesn't add up to me. Thanks in advance for any help!

import socket
    log= input("Enter full path of file you would like to monitor:\n")
    host =input("Enter IP address for remote syslog server:\n")
    port =input("Enter syslog service port to send syslogs to:\n")
    port=int(port)
    with open(log,'r') as file:
        current_pos = 0
        data=file.read().splitlines()
        old_len=0
        file.close()

        while True:
            new_len=len(data)
            udp_port = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)

            with open(log,'r') as file:
                data=file.read().splitlines()

                while new_len > old_len and current_pos < new_len:
                    msg=data[current_pos]
                    print('Sending....',msg,'=====>',host,':',port)
                    udp_port.sendto(bytes(msg, "utf-8"), (host, port))
                    current_pos+=1

                file.close()#Is this necessary here? 
                old_len=new_len

            #udp_port.shutdown()#stay open only during transmission
            udp_port.close()
polyphemus11
  • 103
  • 11
  • You are in a busy loop, why wouldn't you use 100% CPU, unless there was an io bottleneck? The bigger the files, the more likely CPU is your bottleneck. – Jared Jan 24 '15 at 05:29
  • 1
    You're continually opening the file, splitting it, and then checking to see if you have more lines than you had last time. That's likely to be very expensive (especially if the log isn't really changing much between runs, and is very large). You'd be better off opening the file, and then doing readline until you get EOF, then sleeping for a while, waking up, and doing the readline until EOF again. – Charlie Jan 24 '15 at 05:33
  • 1
    There are APIs in Linux that allow you to receive notifications when a file is written, search for "inotify". Hook into this notification. Also, keep the current position in the file. Lastly, be prepared to handle the case that a file is truncated. Check the `tail` sources, it does what you need. – Ulrich Eckhardt Jan 24 '15 at 08:51
  • I guess I didn't realize how much this would clog the CPU. But then again, the pc I am using to test this has way more processing power than my VM. I'll try the idea to sleep. This program is going to be a substitute for rsyslog on my webserver, as it is 3rd party hosted and doesn't come with any syslog programs. This is my way around that as it does allow python. – polyphemus11 Jan 24 '15 at 17:02
  • I found this on `tail` http://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python Once I finish this project, I'll start with a tail one (As I'm learning, I really want to say to myself I did/learned enough in one project before I move on to the next. – polyphemus11 Jan 24 '15 at 17:10

1 Answers1

1

Your code has a while True: block. This means it is going to loop over and over again, continually reading from your file. The only break the CPU gets is from blocking calls (such as network and other I/O) where your thread will yield CPU time until the I/O resources become available.

To avoid thrashing the CPU, you should put a sleep() call in at the end of your while loop. Even a sleep of 10ms should give you low latency, but ease up on the CPU.

Aaron D
  • 7,540
  • 3
  • 44
  • 48
  • I'll try to sleep idea. I guess I don't really need the instantaneous feedback. – polyphemus11 Jan 24 '15 at 17:03
  • I wish I could mark two answers as correct. Excellent point about how to check form smaller files (once the access logs roll over). I'll need to add that in as well, thanks Ulrich Eckhardt. But the Sleep() function had the best immediate effect. CPU (at its peak) is now down to 12%. Thanks. – polyphemus11 Jan 24 '15 at 17:34