0

Hi I have the following log file structure:

####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>
####<20-Jan-2015 07:16:43 o'clock UTC> <Notice> <Stdout> <example2.com>
####<21-Jan-2015 07:16:48 o'clock UTC> <Notice> <Stdout> <example3.com>

How can I filter this file by a date interval, for example: Show all data between 19'th and 20'th of January 2015

I tried to use awk but I have problems converting 19-Jan-2015 to 2015-01-19 to continue comparison of dates.

Arnab Nandy
  • 6,472
  • 5
  • 44
  • 50
  • `man gawk` and check function `mktime()` for conveting. Or using external `date` to give you awk comparable values. – Kent Feb 02 '15 at 10:51
  • mktime(datespec) datespec, is a string of the form "YYYY MM DD HH MM SS [DST]" in my case I have no month number just Jan :( – Gheorghe Frunza Feb 02 '15 at 10:58
  • You can make a table like `m["Jan"]=1` in Begin block. – Kent Feb 02 '15 at 11:08
  • @triplee the date is in a different format so I don't think that duplicate is applicable, although i'm sure i've seen a similar format question before... –  Feb 02 '15 at 11:44
  • Parsing the date into a standard format may well be a legitimate question in its own right, but beyond that, this is a very common FAQ. – tripleee Feb 02 '15 at 12:55

4 Answers4

3

For an oddball date format like that, I'd outsource the date parsing to the date utility.

#!/usr/bin/awk -f

# Formats the timestamp as a number, so that higher numbers represent
# a later timestamp. This will not handle the time zone because date
# can't handle the o'clock notation. I hope all your timestamps use the
# same time zone, otherwise you'll have to hack support for it in here.
function datefmt(d) {
  # make d compatible with singly-quoted shell strings
  gsub(/'/, "'\\''", d)

  # then run the date command and get its output
  command = "date -d '" d "' +%Y%m%d%H%M%S"
  command | getline result
  close(command)

  # that's our result.
  return result;
}

BEGIN {
  # Field separator, so the part of the timestamp we'll parse is in $2 and $3
  FS = "[< >]+"

  # start, end set here.
  start = datefmt("19-Jan-2015 00:00:00")
  end   = datefmt("20-Jan-2015 23:59:59")
}

{
  # convert the timestamp into an easily comparable format
  stamp = datefmt($2 " " $3)

  # then print only lines in which the time stamp is in the range.
  if(stamp >= start && stamp <= end) {
    print
  }
}
Wintermute
  • 42,983
  • 5
  • 77
  • 80
2

If the name of the file is example.txt, the the below script should work

 for i in `awk -F'<' {'print $2'} example.txt| awk {'print $1"_"$2'}`; do date=`echo $i | sed 's/_/ /g'`;  dunix=`date -d "$date" +%s`; if [[ (($dunix -ge 1421605800)) && (($dunix -le 1421778599)) ]]; then  grep "$date" example.txt;fi;  done

The script just converts the time provided in to unix timestamp, then compares the time and print the lines that meets the condition from the file.

LogicIO
  • 627
  • 7
  • 15
  • The idea has some merit, but using two Awks to print the output in a format you cannot directly use is incredibly clumsy. – tripleee Feb 02 '15 at 12:56
2

Using string comparisons jwill be faster than creating date objects:

awk -F '<' '
    {split($2, d, /[- ]/)} 
    d[3]=="2015" && d[2]=="Jan" && 19<=d[1] && d[1]<=20
' file
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • 2
    What if it was between feb and jan ? –  Feb 02 '15 at 11:33
  • 1
    How does this have two upvotes when it is extremely limited in what it can do and has no explanation but wintermutes has 0 ? –  Feb 02 '15 at 12:00
1

Another way using mktime all in awk

awk '

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}
{Time=0}
match($0,/<([^ ]+) ([^ ]+)/,a){
        split(a[1],b,"-")
        split(a[2],c,":")
        b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3
        Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])
}
Time<To&&Time>From

' file

Output

####<19-Jan-2015 07:16:47 o'clock UTC> <Notice> <Stdout> <example.com>

How it works

BEGIN{
        From=mktime("2015 01 19 00 00 00")
        To=mktime("2015 01 20 00 00 00")
}

Before processing the lines set the dates To and From where the data we want will be between the two.
This format is required for mktime to work.
The format is YYYY MM DD HH MM SS.

{time=0}

Reset time so further lines that don't match are not printed

match($0,/<([^ ]+) ([^ ]+)/,a)

Matches the first two words after the < and stores them in a. Executes the next block if this is successful.

    split(a[1],b,"-")
    split(a[2],c,":")

Splits the date and time into individual numbers/Month.

b[2]=(index("JanFebMarAprMayJunJulAugSepOctNovDec",b[2])+2)/3

Converts month to number using the fact that all of them are three characters and then dividing by 3.

 Time=mktime(b[3]" "b[2]" "b[1]" "c[1]" "c[2]" "c[3])

makes time with collected values

Time<To&&Time>From

if the time is more than From and less than To it is inside the desired range and the default action for awk is to print.


Resources

https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html

Community
  • 1
  • 1