1

My Tomcat logs are build in this format:

[<DATE>] [<COMPONENT>] ERROR_TYPE <ERROR_NAME> - <Rest of line>

Where ERROR_TYPE is a log4j value like DEBUG or ERROR.

e.g.,

[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...
[18/Jul/2012:09:54:32 +0000] [component2] DEBUG IPNotFound - ...
[18/Jul/2012:09:54:32 +0000] [component1] TRACE Connected - ...
[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...

I would like to create a maps from the tuple (ERROR_TYPE, ERROR_NAME) to the number of occurrences, e.g.

ERROR ConnectionTimeOut       2
DEBUG IPNotFound              1
TRACE Connected               1

How do I match something like:

_anything_ (ERROR|DEBUG|TRACE|WARN|FATAL_spaces_ _another_word_)_anything_

in AWK, and return only the part in parentheses?

Adam Matan
  • 128,757
  • 147
  • 397
  • 562

1 Answers1

3
awk '/ERROR|DEBUG|TRACE|WARN|FATAL/ {count[$4,$5]++} END {for (i in count) {split(i, a, SUBSEP); print a[1], a[2], count[i]}}' inputfile

Lines are selected which contain the error types. A count array element is incremented for the type and name taken together as the index. The comma represents the contents of the SUBSEP variable which defaults to \034. In the END block, iterate over the count array, splitting the indices using the SUBSEP variable. Print the type, name and count.

Edit:

This uses a regex to handle unstructured log entries:

awk 'match($0, /(ERROR|DEBUG|TRACE|WARN|FATAL) +[^ ]+/) {s = substr($0, RSTART, RLENGTH); split(s, a); count[a[1],a[2]]++} END {for (i in count) {split(i, a, SUBSEP); print a[1], a[2], count[i]}}' inputfile
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • The logs are often unstructured, so I can't use expressions like `$4`. I want to match only two word, regardless of what happens before or after. – Adam Matan Jul 18 '12 at 10:55
  • The `split` function should be `split(i, a, SUBSEP)` to get indexes, not values. – Birei Jul 18 '12 at 11:04
  • @AdamMatan: With unstructured data, it's a bit more complicated. See [this answer](http://stackoverflow.com/questions/2957684/awk-access-captured-group-from-line-pattern). Depends on the AWK flavor. You might be better off with Ruby or Perl, if you can. – DarkDust Jul 18 '12 at 11:10
  • Thanks! The log is indeed unstructured, so I turned to Python. – Adam Matan Jul 18 '12 at 11:19