The following awk
script parses the logfile and performs the calculation of time deltas (intervals)
#!/usr/bin/awk -f
# converts milliseconds to HH:MM:SS,mmm format
function msecs2date(msc) {
h = msc/3600000
m = (msc%3600000)/60000
s = (msc%60000)/1000
ms = msc%1000
printf("%02d:%02d:%02d,%03d\n", h, m, s, ms)
}
# converts HH:MM:SS,mmm to milliseconds
function date2msecs(dat) {
split(dat,d,":")
split(d[3],sx,",")
return d[1]*3600000 + d[2]*60000 + sx[1]*1000 + sx[2]
}
# parses the logfile and generates 2 space-separated columns:
# Col.1 > displays the get/release pair REQUEST_CODE
# Col.2 > displays the time delta between the 'get' and 'release'
# events for the respective REQUEST_CODE in Col.1
BEGIN {FS=" "}
{
one = $1
ten = $NF
getline
if ($10 == ten){
printf("%s ",ten)
msecs2date(date2msecs($1)-date2msecs(one))
}
}
You can use it by saving it as, for example, logdelta and then making it executable and running it:
$ chmod +x logdelta
$ ./logdelta logfile > outputfile
This is the output when your logfile extract (saved as log) is fed into the script:
$ ./logdelta log
wCv4cbch 00:00:00,006
sZhegruu 00:00:00,007
What this script does, essentially, is quite simple (check also the comments in the script):
It parses the logfile line-by-line using the blank space as delimiter (FS=" "
), gets the appropriate tokens from both lines that are relevant to a specific code (using the getline
trick) and then moves on to check whether the request codes from both lines are equal. If they are it first calculates the time delta in milliseconds utilizing the date2msecs
function on both timestamps and then converts this time delta back to HH:MM:SS,mmm format using the msecs2date
function, and so on until the end of the logfile.
The two converter functions are pretty straightforward, and you can find more info on the split()
function here and also here.
Now, if you are going to use this script for production server logfiles, there are some things worth noting:
A - Never completely trust code found on the internet (and this also applies to this very script)
which means one thing: Test again and again, through various corner cases like corrupted logfiles and other malformations or abnormalities, before adopting any solution (see NOTE).
B - Performance matters
This is why I chose awk
to implement this script. To test its performance I created a full day logfile based on your provided extract, using the following c++
(well, c
actually) program:
#include <cstdio>
#include <algorithm> // for rand()
# creates len-sized random alphanumeric codes
void gen_random(char* s, const int len) {
static const char alphanum[] ="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < len; ++i) s[i] = alphanum[rand() % (sizeof(alphanum) - 1)];
s[len] = 0;
}
int main(int argc, char* argv[]){
char* filler="INFO [...] (http--0.0.0.0-8443-17) User xxx";
char* coda="REQUEST_CODE =";
char* action;
char gc[9];
for (int i=0;i<24;i++){
for (int j=0;j<60;j++){
for (int k=0;k<60;k++){
for (int l=0;l<1001;l=l+7){
l % 2 == 0 ? ( action="get", gen_random(gc,8) ):(action="release", void(0));
printf("%02d:%02d:%02d,%003d %s %s %s %s\n",i,j,k,l,filler,action,coda,gc);
};
printf("%02d:%02d:%02d,999 %s release %s %s\n",i,j,k,filler,coda,gc);
};
};
};
return 0;
}
Build it like this:
$ g++ -o logen logen.cpp
and run it:
$ ./logen > logfile
It creates a 1.1GB (12441600 lines) faux logfile:
$ head -n 4 logfile
00:00:00,000 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = fa37JncC
00:00:00,007 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = fa37JncC
00:00:00,014 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = HryDsbza
00:00:00,021 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = HryDsbza
...
$ tail -n 4 logfile
23:59:59,980 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = AI9xRoPQ
23:59:59,987 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = AI9xRoPQ
23:59:59,994 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = LEAeMTva
23:59:59,999 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = LEAeMTva
which has (more or less) a representative size of a real logfile.
When fed into the script it had, performance-wise, the following results:
$ time ./logdelta logfile > ouputfile
real 0m35.776s
user 0m30.364s
sys 0m2.312s
ie ~35 seconds for a 1.1 GB logfile, which is a rather satisfactory performance (tested on a single-core 2GB VM running Xubuntu 14.04).
Here is also some sample output:
$ head -n 2 outputfile
fa37JncC 00:00:00,007
HryDsbza 00:00:00,007
...
$ tail -n 2 outputfile
AI9xRoPQ 00:00:00,007
LEAeMTva 00:00:00,005
NOTE: The reason I posted the code for the faux logfile generator is to encourage you to modify it and try to incorporate various artificial errors into the generated logfile, in order to be able to test how this script (as-is or with your modifications) copes with other corner cases.