-1

I have log files of size of the order of several 100 MBs, containing lines like this, containing the date-time information in the beginning:

[Tue Oct  4 11:55:19 2016] [hphp] [25376:7f5d57bff700:279809:000001] [] \nFatal error: syntax error, unexpected T_ENCAPSED_AND_WHITESPACE, expecting ')' in /var/cake_1.2.0.6311-beta/app/webroot/openx/www/delivery/postGetAd.php(12479)(62110d90541a84df30dd077ee953e47c) : eval()'d code on line 1

I have a plugin (nagios check_logwarn) to print out only those lines which contain some of the error strings. Following is the command to run it:

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" 

I want to filter out further, based on the date-time, i.e., all the lines which are after, say, 11:55:10.

I am not sure whether to use regex for this. Following is what I have so far:

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161003.log "^.*Fatal error*" | grep "15\:19\:1*"

But this will only filter those logs whose time is in the 19th minute of the 15th hour.

Update

I am now able to compare the time part of the date-time.

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'

How do I compare the day part?

Update 2 - opening bounty

I am having to open a bounty because I do not have much expertise with shell and I need a solution soon.

I am stuck at the part of comparing the dates. With The solution https://stackoverflow.com/a/39856560/351903, I am facing this problem. If that is fixed, I would be happy.

I am also open to some enhancement to this (I don't mind if the output has some jumbled up order of logs) -

/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_20161004.log "^.*Fatal error*" | awk '$4 > "14:22:11"'

I looked for some date-time to timestamp comparison, but couldn't find something working.

I am not able to proceed from what is given in this question. I cannot see the timestamp value using this -

echo date -d '06/12/2012 07:21:22' +"%s"

Not sure what am I missing.

Community
  • 1
  • 1
Sandeepan Nath
  • 9,966
  • 17
  • 86
  • 144

2 Answers2

0

This uses a reference timestamp and compares the timestamp from the log file to it; if the log file's time stamp is more recent, the line gets printed:

awk -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" -F "[][]" '
    {
        cmd = "date +\047%s\047 -d \"" $2 "\""
        if ((cmd | getline val) > 0) {
            if (val > refdate)
                print
        }
        close(cmd)
    }
' infile

Here is how it works:

  • -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" converts the date given (our reference date) to seconds since the epoch.
  • -F "[][]" sets the field separator to square brackets, so the timestamp we want is simply $2.
  • "date +\047%s\047 -d \"" $2 "\"" is the shell command we'd like to execute; it becomes date +'%s' -d "$2", i.e., it converts the log file timestamp to seconds since the epoch. \047 is a single quote.
  • cmd | getline val evaluates cmd and assigns the result to val, so val now holds the timestamp from the log file in seconds since the epoch.
  • We check the success of getline with (cmd | getline val) > 0.
  • If getline was successful, if (val > refdate) print compares the log file timestamp to the reference date and, if the log file timestamp is more recent, prints the line.
  • close(cmd) closes the pipeline.

References

  • date -d is very flexible and understands a lot of formats in the date string, see the date manual.
  • getline in the gawk user manual and on freeshell.org (hat tip Ed Morton, who also pointed out how to properly use getline in his helpful comment)
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • Thanks for answering. However, if I run exactly what you have mentioned, I get `awk: cmd. line:10: fatal: cannot open file `/tmp/logwarn' for reading (Success)`. Do you mean running the command as such as such, or replace it with the awk in my command, like this - – Sandeepan Nath Oct 05 '16 at 07:42
  • `/usr/local/nagios/libexec/check_logwarn -d /tmp/logwarn -p /mnt/log/hiphop/error_`(date +'%Y%m%d')`.log "^.*Fatal*" | awk -v refdate="$(date +'%s' -d 'Mon Oct 3 10:00:00 2016')" -F "[][]" ' { cmd = "date +\047%s\047 -d \"" $2 "\"" if ((cmd | getline val) > 0) { if (val > refdate) print } close(cmd) } ' /tmp/logwarn` – Sandeepan Nath Oct 05 '16 at 07:42
  • I'm sorry, I am not an expert with shell commands/bash. They way I visualized the original command that I posted is that the entire thing before the pipe `|` produces an output of log lines, which the `awk` part processes further. I did not understand what the `/tmp/logwarn` was doing. – Sandeepan Nath Oct 05 '16 at 07:44
  • @SandeepanNath I assumed that was the name of your input file, but it's apparently just a parameter for `check_logwarn`. You can pipe to the command, but then you must not specifiy a file name after the last single quote, so either `awk -v [...] 'command' input_file` (where `[...]` stands for omitted code and `command` for everything between single quotes) or `other_command | awk -v [...] 'command'`, where `other_command` is the command the output of which you'd like to pipe to awk. – Benjamin W. Oct 05 '16 at 13:50
  • I didn't understand what you meant by omitted code and 'command'. Could you please modify the original answer? I checked that you modified the command to change `/tmp/logwarn` to `infile`. There is something that I am missing about the change. – Sandeepan Nath Oct 06 '16 at 07:52
  • 1
    @SandeepanNath Awk, like many unix programs, accepts input either from a file or from standard input through a pipe. If you pipe to it, as you want to, you must not specify a file name, so in your case, you want to pipe to the command in my answer and _not_ add `infile` at the end. `infile` is just a generic name for an input file, but you don't have one, you process the output of another command, not the contents of a file. – Benjamin W. Oct 06 '16 at 14:09
  • There seems to be an issue, not sure whether with the `awk` part. When I test the complete command (`other_command | < your command with the awk part>`) with a file which contains all dates earlier than the reference date, the process runs without an end. It doesn't output anything. I can see the process if I do a `ps -ef`. – Sandeepan Nath Oct 07 '16 at 07:26
  • @SandeepanNath You have to show the output you have before piping to awk, otherwise it's just blind guessing. – Benjamin W. Oct 07 '16 at 14:36
  • The output is huge, so not sure how to send it here. Anyway, I found that using nagios plugin, I do not need to check based on dates anymore. Will add more details later. Thanks for your efforts by the way. Awarded you bounty. – Sandeepan Nath Oct 14 '16 at 06:35
0

You Need Comparable Date Representations

Regular expressions are okay for extracting data, but a terrible way to compare dates to one another. You actually need to convert your timestamps to something comparable, such as Epoch time or DateTime objects. If you want to find all the lines that contain a timestamp greater than some other timestamp, you need to parse out the timestamp in each line for comparison.

A Ruby Example

#!/usr/bin/env ruby

require 'date'

# Convert your given timestamp to something comparable.
timestamp = DateTime.parse ARGV.first

# Loop over each line of your logfile.
File.open(ARGV.last).each_line do |line|
  # Use a rather naive regex to extract the timestamp from each line.
  next if line !~ /^\[.*?\]/

  # Print lines that contain a later timestamp than your target.
  puts line if DateTime.parse($&) > timestamp
end

The script takes two positional arguments:

  1. A timestamp that resembles RFC 2822, with or without a time zone offset.
  2. A file to parse.

The script then compares the timestamp on each line, and only prints lines that are earlier than the timestamp passed as an argument. You can modify the comparison from > to >= if you really mean "later than or equal to" your given timestamp, which may be more intuitive.

For example:

ruby /tmp/parse_log_dates.rb "Tue Oct  4 11:55:18 2016" /path/to/logfile

works just fine on the very limited corpus you provided. Your real-world results may vary, especially if your log files don't actually contain a timestamp on each line.

Adam Katz
  • 14,455
  • 5
  • 68
  • 83
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • Thanks for your response. However, I did not use PHP (I am expert in it) for the same task thinking whether it would be wise to write something of my own when there is an existing nagios plugin to do the lookup for error hinting keywords (Fatal Error etc). So, I would prefer a bash command solution for the same. – Sandeepan Nath Oct 07 '16 at 12:46