0

I have a log which has entries that range over multiple lines. The entry always start with a date in the form of 2019-04-05 09:32:58,543. The only indicator that the next log entry starts is that I have again a date. In the first line there is also a unique identifier (XKcEpaUgg3QvsUTsQSuaIwAAATT in the example bellow).

With the help of https://stackoverflow.com/a/17988834/55070 i could come up with an awk command that is pretty close. The command awk 'flag;/2019.*\| XKcEpaUgg3QvsUTsQSuaIwAAATT \|.*/{flag=1;next}/2019.*/{flag=0}' logfile nearly works. The problem is it does not display the first line of the log entry but instead the one of the next line after the entry.

As the second pattern in the awk command also matches the first pattern, a command without the next would only return the first line.

One example of Log entry is:

2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
first body line

second body line
some more information

2019-04-05 09:32:58,765 | some information for the next log entry | OTHER_ID | more info |
leo
  • 3,677
  • 7
  • 34
  • 46

2 Answers2

5
$ cat tst.awk
BEGIN { FS=" [|] " }
/^[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2},[0-9]{3} / { prt(); rec=$0; next }
{ rec = rec ORS $0 }
END { prt() }

function prt(   flds) {
    split(rec,flds)
    if ( flds[3] == tgt ) {
        print rec
    }
}

$ awk -v tgt='XKcEpaUgg3QvsUTsQSuaIwAAATT' -f tst.awk file
2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
first body line

second body line
some more information

$ awk -v tgt='OTHER_ID' -f tst.awk file
2019-04-05 09:32:58,765 | some information for the next log entry | OTHER_ID | more info |
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
3

You can make it simpler:

date_ptn='^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9],[0-9]{3}'
myid="XKcEpaUgg3QvsUTsQSuaIwAAATT"
awk -v id="$myid" -v date_ptn="$date_ptn" -F' \\| ' '$0 ~ date_ptn{p = $3 == id ? 1 : 0}p' file.txt
#2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
#first body line
#
#second body line
#some more information
#

or just $0 ~ date_ptn{ p=id==$3 }p in the awk line.

jxc
  • 13,553
  • 4
  • 16
  • 34
  • 1
    Good idea. The only downside is you can't apply that same approach if you want to test for some value in the body lines (or if the 3rd field isn't on the first line of each record but the OP hasn't shown us that as a possibility so it may not be). – Ed Morton Apr 08 '19 at 19:38
  • As for now, there is no plan to test for some value in the body lines. – leo Apr 10 '19 at 06:32