2

I've a log file which records log as:

  • Format: DATE TIME VALUE
  • Ex: 2016-07-28 16:54:32,504 -47.7669027319

I want to keep logs only for past 24 hours. Logging continues 24*7 at 2 samples/seconds, i.e. 172800 samples/24 hours. From https://stackoverflow.com/a/28057753/5954600 I found Python code, which works and I modified as per my requirement as:

    import datetime

    # for testing I'm using "seconds=60", which will be modified to "hours=24"
    before_24h =  str(datetime.datetime.now() - datetime.timedelta(seconds=60))

    file = open("meter.log","r+")
    logs = file.readlines()
    file.seek(0)
    for each_log in logs:
            if each_log > before_24h:
                    file.write(each_log)
    file.truncate()
    file.close()

This code deletes all logs before 24 hours but writing 172800 lines to file will take some time. So I'm looking for efficient way to do it, if any.

Thanks in advance.

Community
  • 1
  • 1
keyur
  • 77
  • 1
  • 6
  • 2
    You should look in to logrotate implementations. There are several examples you can look up that should help. – idjaw Jul 28 '16 at 17:37
  • Look into your OS commands; many of them have a simple command to filter lines in a file, optimized for that system. Issuing this through the **os** package would almost certainly out-perform your Python loop. – Prune Jul 28 '16 at 18:04
  • @idjaw can't use logrotate since I'm not generating the logs. – keyur Jul 28 '16 at 22:15
  • @Prune i looked into OS commands and i used "SED" to do exactly same as python code for same data but it took longer ! – keyur Jul 28 '16 at 22:16
  • Really!? I would have expect those two to be close, with **sed** being a little faster. **sed** is a good general-purpose tool, but it's still a character-level stream editor. Maybe the time-format conversion isn't as good as I expected. – Prune Jul 28 '16 at 22:57
  • Are the log files in any sort of time order, such that you could do a faster scan? If you could simply partition the file at the 24-hour point, you could do this much faster. – Prune Jul 28 '16 at 23:01
  • Does the code in your question work? As you're comparing a whole line (string, including the measurement) to a datetime converted to string. And if you were just comparing the datetime parts, don't you need to convert them to a datetime instead of string to perform the comparison? – DocZerø Jul 28 '16 at 23:38
  • 1
    @Prune Yes took longer. I had a sample file containing 12000 logs, I tried with both method to delete 250 logs. **sed** took much longer than **for loop**. There was no need for date/time - format conversion, log file is also in string format. Thanks for help though, learned something new ! – keyur Jul 29 '16 at 05:50
  • 1
    @Kristof Yes it works. Compare works character by character, so it never reaches to value part of string. You may have a look here: http://stackoverflow.com/a/4806946/5954600 – keyur Jul 29 '16 at 05:55

0 Answers0