Find interval in log file

Question

i'm trying to write a shell script to extract interval from log files between rows in which are same words. Entries are simple:

19:38:24,077 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = wCv4cbch 19:38:24,083 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = wCv4cbch 19:38:24,091 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = sZhegruu 19:38:24,098 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = sZhegruu

I need to find time interval between same key where key is the code after '=' operator. For example

19:38:24,077 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = wCv4cbch

and

19:38:24,083 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = wCv4cbch

Any ideas?

thank's a lot

Each get/release pair has a unique code even if it is the same user? — sokin, Sep 16 '15 at 07:33

score 0 · Answer 1 · answered Sep 16 '15 at 07:36

0

Use grep and regular expressions, for example if you want 3 minutes interval of logs

grep "16/sep/2002:19:3[1-4]" logfile

answered Sep 16 '15 at 07:36

BlueBerry - Vignesh4303

587
1
13
36

score 0 · Accepted Answer · edited May 23 '17 at 11:52

The following awkscript parses the logfile and performs the calculation of time deltas (intervals)

#!/usr/bin/awk -f

# converts milliseconds to HH:MM:SS,mmm format
function msecs2date(msc) {
    h = msc/3600000
    m = (msc%3600000)/60000
    s = (msc%60000)/1000
    ms = msc%1000
    printf("%02d:%02d:%02d,%03d\n", h, m, s, ms)
}

# converts HH:MM:SS,mmm to milliseconds
function date2msecs(dat) {
    split(dat,d,":")
    split(d[3],sx,",")
    return d[1]*3600000 + d[2]*60000 + sx[1]*1000 + sx[2]
}

# parses the logfile and generates 2 space-separated columns:
#   Col.1 > displays the get/release pair REQUEST_CODE
#   Col.2 > displays the time delta between the 'get' and 'release' 
#     events for the respective REQUEST_CODE in Col.1
BEGIN {FS=" "}
{   
    one = $1 
    ten = $NF 
    getline
    if ($10 == ten){
        printf("%s ",ten) 
        msecs2date(date2msecs($1)-date2msecs(one)) 
    }
}

You can use it by saving it as, for example, logdelta and then making it executable and running it:

$ chmod +x logdelta
$ ./logdelta logfile > outputfile

This is the output when your logfile extract (saved as log) is fed into the script:

$ ./logdelta log
wCv4cbch 00:00:00,006
sZhegruu 00:00:00,007

What this script does, essentially, is quite simple (check also the comments in the script):

It parses the logfile line-by-line using the blank space as delimiter (FS=" "), gets the appropriate tokens from both lines that are relevant to a specific code (using the getline trick) and then moves on to check whether the request codes from both lines are equal. If they are it first calculates the time delta in milliseconds utilizing the date2msecs function on both timestamps and then converts this time delta back to HH:MM:SS,mmm format using the msecs2date function, and so on until the end of the logfile.

The two converter functions are pretty straightforward, and you can find more info on the split() function here and also here.

Now, if you are going to use this script for production server logfiles, there are some things worth noting:

A - Never completely trust code found on the internet (and this also applies to this very script)

which means one thing: Test again and again, through various corner cases like corrupted logfiles and other malformations or abnormalities, before adopting any solution (see NOTE).

B - Performance matters

This is why I chose awk to implement this script. To test its performance I created a full day logfile based on your provided extract, using the following c++ (well, c actually) program:

#include <cstdio>
#include <algorithm>  // for rand()

# creates len-sized random alphanumeric codes
void gen_random(char* s, const int len) {
    static const char alphanum[] ="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    for (int i = 0; i < len; ++i) s[i] = alphanum[rand() % (sizeof(alphanum) - 1)];
    s[len] = 0;
}

int main(int argc, char* argv[]){
    char* filler="INFO [...] (http--0.0.0.0-8443-17) User xxx";
    char* coda="REQUEST_CODE =";
    char* action;
    char gc[9];

    for (int i=0;i<24;i++){
        for (int j=0;j<60;j++){
            for (int k=0;k<60;k++){
                for (int l=0;l<1001;l=l+7){
                    l % 2 == 0 ? ( action="get", gen_random(gc,8) ):(action="release", void(0));
                    printf("%02d:%02d:%02d,%003d %s %s %s %s\n",i,j,k,l,filler,action,coda,gc);
                };
                printf("%02d:%02d:%02d,999 %s release %s %s\n",i,j,k,filler,coda,gc);
            };
        };
    };
    return 0;
}

Build it like this:

$ g++ -o logen logen.cpp

and run it:

$ ./logen > logfile

It creates a 1.1GB (12441600 lines) faux logfile:

$ head -n 4 logfile 
00:00:00,000 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = fa37JncC
00:00:00,007 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = fa37JncC
00:00:00,014 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = HryDsbza
00:00:00,021 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = HryDsbza

...

$ tail -n 4 logfile 
23:59:59,980 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = AI9xRoPQ
23:59:59,987 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = AI9xRoPQ
23:59:59,994 INFO [...] (http--0.0.0.0-8443-17) User xxx get REQUEST_CODE = LEAeMTva
23:59:59,999 INFO [...] (http--0.0.0.0-8443-17) User xxx release REQUEST_CODE = LEAeMTva

which has (more or less) a representative size of a real logfile.

When fed into the script it had, performance-wise, the following results:

$ time ./logdelta logfile > ouputfile

real    0m35.776s
user    0m30.364s
sys     0m2.312s

ie ~35 seconds for a 1.1 GB logfile, which is a rather satisfactory performance (tested on a single-core 2GB VM running Xubuntu 14.04).

Here is also some sample output:

$ head -n 2 outputfile 
fa37JncC 00:00:00,007
HryDsbza 00:00:00,007

...

$ tail -n 2 outputfile 
AI9xRoPQ 00:00:00,007
LEAeMTva 00:00:00,005

NOTE: The reason I posted the code for the faux logfile generator is to encourage you to modify it and try to incorporate various artificial errors into the generated logfile, in order to be able to test how this script (as-is or with your modifications) copes with other corner cases.

Find interval in log file

2 Answers2