I do this sort of thing with sendmail logs, from time to time.
Given:
Jan 15 22:34:39 mail sm-mta[36383]: r0B8xkuT048547: to=<www@web3>, delay=4+18:34:53, xdelay=00:00:00, mailer=esmtp, pri=21092363, relay=web3., dsn=4.0.0, stat=Deferred: Operation timed out with web3.
Jan 15 22:34:39 mail sm-mta[36383]: r0B8hpoV047895: to=<www@web3>, delay=4+18:49:22, xdelay=00:00:00, mailer=esmtp, pri=21092556, relay=web3., dsn=4.0.0, stat=Deferred: Operation timed out with web3.
Jan 15 22:34:51 mail sm-mta[36719]: r0G3Youh036719: from=<obfTaIX3@nickhearn.com>, size=0, class=0, nrcpts=0, proto=ESMTP, daemon=IPv4, relay=[50.71.152.178]
Jan 15 22:35:04 mail sm-mta[36722]: r0G3Z2SF036722: lost input channel from [190.107.98.82] to IPv4 after rcpt
Jan 15 22:35:04 mail sm-mta[36722]: r0G3Z2SF036722: from=<amahrroc@europe.com>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=[190.107.98.82]
Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: lost input channel from ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged) to IPv4 after rcpt
Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: from=<clunch.hilarymas@javagame.ru>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged)
I use a script something like this:
#!/usr/bin/awk -f
BEGIN {
search=ARGV[1]; # Grab the first command line option
delete ARGV[1]; # Delete it so it won't be considered a file
}
# First, store every line in an array keyed on the Queue ID.
# Obviously, this only works for smallish log segments, as it uses up memory.
{
line[$6]=sprintf("%s\n%s", line[$6], $0);
}
# Next, keep a record of Queue IDs with substrings that match our search string.
index($0, search) {
show[$6];
}
# Finally, once we've processed all input data, walk through our array of "found"
# Queue IDs, and print the corresponding records from the storage array.
END {
for(qid in show) {
print line[qid];
}
}
to get the following output:
$ mqsearch airtel /var/log/maillog
Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: lost input channel from ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged) to IPv4 after rcpt
Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: from=<clunch.hilarymas@javagame.ru>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged)
The idea here is that I'm printing all lines that match the Sendmail Queue ID of the string I want to search for. The structure of the code is of course a product of the structure of the log file, so you'll need to customize your solution for the data you're trying to analyse and extract.