sorting by inconsistently formatted elapsed time field ( k8s events by actual time since event )

Question

edited

I find myself frequently looking to see if something has stopped happening. To do this, it helps to see events in chronological order...

This solution seems to work, but the formatting still drives me insane...

The solution I have "sort of" working -

kubectl get events |
  sed -E '/^[6789][0-9]s/{h; s/^(.).*/\1/; y/6789/0123/; s/^(.)/01m\1/;
                          x; s/^.(.*)/\1/; H;
                          x; s/\n//; };
          s/^10([0-9]s)/01m4\1/; s/^11([0-9]s)/01m5\1/; s/^([0-9]s)/00m0\1/; s/^([0-9]+s)/00m\1/;
          s/^([0-9]m)/0\1/; s/^([0-9]+m)([0-9]s)/\10\2/;
          s/^L/_L/;' | sort -r

...this seems a bit like overkill to me.

The whitespace-delimited left-justified fields have no leading zeroes, report only seconds up to 2m as [0-9]+s, then report as [0-9]+m[0-9]+s up to 5m, after which it seems to report only [0-9]+m.

Anyone have a short, maybe even simple-ish, easier to read solution that works?
No preference of tool (sed, awk, perl, native bash, etc), as long as it works and is likely to be already installed anywhere I need to work.

It's not a high priority, but seemed like a fun little challenge I thought I'd share.

My test data:

$: cat sample
LAST ...
28s ...
2m22s ...
46m ...
7s ...
75s ...
119s ...

Result with desired output -

$: sed -E '/^[6789][0-9]s/{h; s/^(.).*/\1/; y/6789/0123/; s/^(.)/01m\1/;
                           x; s/^.(.*)/\1/; H;
                           x; s/\n//; };
           s/^10([0-9]s)/01m4\1/; s/^11([0-9]s)/01m5\1/; s/^([0-9]s)/00m0\1/; s/^([0-9]+s)/00m\1/;
           s/^([0-9]m)/0\1/; s/^([0-9]+m)([0-9]s)/\10\2/;
           s/^L/_L/;' sample | sort -r
_LAST ...
46m ...
02m22s ...
01m59s ...
01m15s ...
00m28s ...
00m07s ...

I've arbitrarily converted to a standardized version of the existing general output format just to keep it easily transferable to other members of the team. Either way, it's only being used for "eyeballing" the data, so other formats are not a problem as long as it's easy to read.

While there could theoretically include hours and days, such old events are usually not reported by this tool and are out of scope for this problem, and if needed I can likely extrapolate whatever solutions are presented. since I can get the order from this approach I'm really only looking for elegant formatting options.

A clumsy adaptation of Daweo's awk solution with formatting -

$: awk '/^[0-9]/{ if($1!~/m/){$1="0m" $1}; split($1,arr,/m/);
        t=arr[1]*60+arr[2]; m=(t-(t%60))/60; s=t-(m*60);
        m=sprintf("%02dm",m); if(s){ s=sprintf("%02ds",s) } else s="";
        $1=sprintf("%s%s",m,s); print; } /^L/{print "_"$0}' sample |
   sort -r
_LAST ...
46m ...
02m22s ...
01m59s ...
01m15s ...
00m28s ...
00m07s ...

Others still appreciated.

My OCD wants minutes on every recent-ish line, formatted for consistent width, and seconds as well if they are meaningful (which they aren't above 5m, as they get rounded out.) The main point of the question is to learn algorithms, though; I can make it *do* what I want several ways, but good sorting and/or formatting methods are always handy. You guys almost always present me with something better than my first few attempts. — Paul Hodges, Dec 13 '22 at 22:02

Ed Morton · Answer 1 · 2022-12-14T17:31:09.770

I'd convert everything to seconds first and then print it as HH:MM:SS, e.g.:

$ cat tst.awk
BEGIN {
    split("h m s",denoms)
    fmts["s"] = fmts["m"] = fmts["h"] = "%02d"
    mults["s"] = 1
    mults["m"] = 60
    mults["h"] = 60 * 60
}
sub(/^L/,"_L") {
    print
    next
}
{
    time = $1

    secs = 0
    while ( match(time,/[0-9]+./) ) {
        value = substr(time,1,RLENGTH-1)
        denom = substr(time,RLENGTH)
        time  = substr(time,RLENGTH+1)
        secs += value * mults[denom]
    }

    for ( i=1; i in denoms; i++ ) {
        denom = denoms[i]
        out = (i>1 ? out ":" : "") sprintf(fmts[denom],int(secs/mults[denom]))
        secs %= mults[denom]
    }

    $1 = out

    print | "sort -r"
}

$ awk -f tst.awk sample
_LAST ...
00:46:00 ...
00:02:22 ...
00:01:59 ...
00:01:15 ...
00:00:28 ...
00:00:07 ...

Obviously add the definitions for "d" in the BEGIN section if you want to include days and similarly for other longer durations.

score 2 · Answer 2 · answered Dec 13 '22 at 20:41

I would harness GNU AWK for this task following way, let file.txt content be

2m36s ...
2m9s ...
28s ...
2m22s ...
2m6s ...
46m ...
7s ...
45m ...
3m9s ...
31m ...
16m ...
75s ...
74s ...
67s ...
46m ...
63s ...
2m15s ...
119s ...
16m ...
75s ...
74s ...
69s ...
46m ...
31m ...
16m ...
75s ...
62s ...

then

awk '$1!~/m/{$1="0m" $1}{split($1,arr,/m/);$1=arr[1]*60+arr[2];print}' file.txt

gives output

156 ...
129 ...
28 ...
142 ...
126 ...
2760 ...
7 ...
2700 ...
189 ...
1860 ...
960 ...
75 ...
74 ...
67 ...
2760 ...
63 ...
135 ...
119 ...
960 ...
75 ...
74 ...
69 ...
2760 ...
1860 ...
960 ...
75 ...
62 ...

Explanation: if there is not m in 1st field I prepend 0m, then I use split function at m characters, then I compute value: I multiply by 60 what is before m to convert to seconds and add what is after to get total in seconds, for rows where there is not seconds part, seconds part is empty string which is turned into zero when used in arithmetics. This output might be then sorted numerically that is

awk '$1!~/m/{$1="0m" $1}{split($1,arr,/m/);$1=arr[1]*60+arr[2];print}' file.txt | sort -n

which gives output

7 ...
28 ...
62 ...
63 ...
67 ...
69 ...
74 ...
74 ...
75 ...
75 ...
75 ...
119 ...
126 ...
129 ...
135 ...
142 ...
156 ...
189 ...
960 ...
960 ...
960 ...
1860 ...
1860 ...
2700 ...
2760 ...
2760 ...
2760 ...

(tested in GNU Awk 5.0.1 and sort (GNU coreutils) 8.30)

Not bad. Could `printf` to get leading zeroes for consistency of format. Converting everything to seconds does let us get the right order, though appending an s at the end is probably inaccurate, as delays >5m round them off... (I think?) I'd *rather* still see a `##m##s` format, but that's just me being pissy, and it's easy enough to do with modulo/remainder math if needed. Will try one and see if it looks better or worse than my `sed` monstrosity, lol - thanks, worth an upvote. — Paul Hodges, Dec 13 '22 at 20:51
Just FYI - input has a header, which is reporting as a zero, as is a blank line in the sample file. — Paul Hodges, Dec 13 '22 at 20:53
@PaulHodges you might use `sprintf` replacing `$1=arr[1]*60+arr[2]` using say `$1=sprintf("%04d",arr[1]*60+arr[2])` as long as condition that specified width is greater than log10(maximal_value_in_seconds) does hold — Daweo, Dec 14 '22 at 09:55

score 2 · Answer 3 · answered Dec 13 '22 at 20:45

2

If all possible time formats are shown in the example, this might work. It shows the file, the output and the final sort, pasted together for clarity.

It looks for m, multiplies by 60 and adds any existing seconds. If no m is found it simply prints the seconds.

$ paste sample <(awk '/m/{split($1,ar,"m"); print ar[1] * 60 + ar[2]} 
                 !/m/{print $1 * 1}' sample) | sort -nk 3
7s ...  7
28s ... 28
62s ... 62
63s ... 63
67s ... 67
69s ... 69
74s ... 74
74s ... 74
75s ... 75
75s ... 75
75s ... 75
119s ...    119
2m6s ...    126
2m9s ...    129
2m15s ...   135
2m22s ...   142
2m36s ...   156
3m9s ...    189
16m ... 960
16m ... 960
16m ... 960
31m ... 1860
31m ... 1860
45m ... 2700
46m ... 2760
46m ... 2760
46m ... 2760

answered Dec 13 '22 at 20:45

Andre Wildberg

12,344
3
12
29

1

Sweet - very concise. Still doesn't format the output, though that's just easing my OCD tendencies... – Paul Hodges Dec 14 '22 at 14:42
@PaulHodges isn't formatting the output the main point of your question? You said in it `I'm really only looking for elegant formatting options`. Don't you also need that header line of `LAST` changed to `_LAST` and included in the output? If you don't need formatting and that LAST mapping and all you want is a way to sort the timestamped output, that'd be a much easier problem to solve. – Ed Morton Dec 16 '22 at 15:06
Yes, but I still like seeing a variety of ways to handle pieces I can compose...and I appreciate most any effort... – Paul Hodges Dec 21 '22 at 04:40

Arkadiusz Drabczyk · Answer 4 · 2022-12-13T21:31:25.163

2

Using only GNU awk:

awk '{match($1,  /[0-9]+m/, m); match($1, /[0-9]+s/, s)
    arr[m[0]*60 + s[0]] = $0
}
    END {
    n = asorti(arr, sorted, "@ind_num_asc")
    for(i = 1; i <= n; i++)
          print arr[sorted[i]]
}
' sample

Looks a bit cleaner than a bunch of chained seds. It prints:

LAST ...
7s ...
28s ...
62s ...
63s ...
67s ...
69s ...
74s ...
75s ...
119s ...
2m6s ...
2m9s ...
2m15s ...
2m22s ...
2m36s ...
3m9s ...
16m ...
31m ...
45m ...
46m ...

because I actually think that you want the newest entries to be shown near the top, if not just change ind_num_asc to ind_num_desc.

edited Dec 13 '22 at 21:31

answered Dec 13 '22 at 21:24

Arkadiusz Drabczyk

11,227
2
25
38

I agree Nice work/logic! Learn every day. – Eric Marceau Dec 14 '22 at 17:33

score 1 · Answer 5 · answered Dec 13 '22 at 21:52

Using sed

$ cat script.sed
1! {
        s/^[0-9]s/0m0&/
        /^[0-9]{2,}s/ {
                s/^6/01m0/
                s/^7/01m1/
                s/^8/01m2/
                s/^9/01m3/
                s/^10/01m4/
                s/^11/01m5/
                s/^[0-5][0-9]s/0m&/
        }
        s/^([0-9]m)([0-9]s)/0\10\2/
        s/^([0-9]+m)([0-9]s)/\10\2/
        s/^[0-9]m/0&/
}
1s/^/_/

To run

$ sed -Ef script.sed input_file | sort -r
_LAST ...
46m ...
46m ...
46m ...
45m ...
31m ...
31m ...
16m ...
16m ...
16m ...
03m09s ...
02m36s ...
02m22s ...
02m15s ...
02m09s ...
02m06s ...
01m59s ...
01m15s ...
01m15s ...
01m15s ...
01m14s ...
01m14s ...
01m09s ...
01m07s ...
01m03s ...
01m02s ...
00m28s ...
00m07s ...

score 1 · Answer 6 · answered Dec 13 '22 at 23:07

If you're looking for a generic utility to "normalize" the reported values, here is a "hack" demonstrating that capability.

#!/bin/bash

DBG=1

INPUT=`basename "$0" ".sh" `.input

cat >"${INPUT}" <<"EnDoFiNpUt"
2m36s ...
2m9s ...
28s ...
2m22s ...
2m6s ...
46m ...
7s ...
45m ...
3m9s ...
31m ...
16m ...
75s ...
74s ...
67s ...
46m ...
63s ...
2m15s ...
119s ...
16m ...
75s ...
74s ...
69s ...
46m ...
31m ...
16m ...
75s ...
62s ...
EnDoFiNpUt

#cat >"${INPUT}" <<"EnDoFiNpUt"
#119s ...
#EnDoFiNpUt

awk -v dbg="${DBG}" 'BEGIN{
    split("", times) ;
    items=0 ;
}{
    if( $0 == "" ){
        exit ;
    }else{
        if( dbg == 1 ){ print "\n"$0 | "cat >&2" ; } ;
        rem=$1 ;
        items++ ;

        posH=index( rem, "h" ) ;
        if( posH == 0 ){
            hr=0 ;
            if( dbg == 1 ){ print "\thr = "hr | "cat >&2" ; } ;

            posM=index( rem, "m" ) ;
            if( posM == 0 ){
                min=0 ;
                if( dbg == 1 ){ print "\tmin = "min | "cat >&2" ; } ;

                posS=index( rem, "s" ) ;
                if( posS == 0 ){
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", rem/60 ) ;
                        sec=rem-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                }else{
                    beg=substr( rem, 1, posS-1) ;
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", beg/60 ) ;
                        sec=beg-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                } ;
            }else{
                min=substr( rem, 1, posM-1) ;
                rem=substr( rem, posM+1 ) ;
                if( dbg == 1 ){ print "\tmin = "min | "cat >&2" ; } ;

                posS=index( rem, "s" ) ;
                if( posS == 0 ){
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", rem/60 ) ;
                        sec=rem-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                }else{
                    beg=substr( rem, 1, posS-1) ;
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", beg/60 ) ;
                        sec=beg-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                } ;
            } ;
        }else{
            hr=substr( rem, 1, posH-1) ;
            rem=substr( rem, posH+1 ) ;
            if( dbg == 1 ){ print "\thr = "hr | "cat >&2" ; } ;

            posM=index( rem, "m" ) ;
            if( posM == 0 ){
                min=0 ;
                if( dbg == 1 ){ print "\tmin = "min | "cat >&2" ; } ;

                posS=index( rem, "s" ) ;
                if( posS == 0 ){
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", rem/60 ) ;
                        sec=rem-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                }else{
                    beg=substr( rem, 1, posS-1) ;
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", beg/60 ) ;
                        sec=beg-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                } ;
            }else{
                min=substr( rem, 1, posM-1) ;
                rem=substr( rem, posM+1 ) ;
                if( dbg == 1 ){ print "\tmin = "min | "cat >&2" ; } ;

                posS=index( rem, "s" ) ;
                if( posS == 0 ){
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", rem/60 ) ;
                        sec=rem-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                }else{
                    beg=substr( rem, 1, posS-1) ;
                    if( rem = "" ){
                        sec=0 ;
                    }else{
                        minX=sprintf("%d", beg/60 ) ;
                        sec=beg-minX*60 ;
                        min=min+minX ;
                        if( dbg == 1 && minX > 0 ){ print "\t\tmin = "min | "cat >&2" ; } ;
                    } ;
                } ;
            } ;
            if( dbg == 1 ){ print "\tsec = "sec | "cat >&2" ; } ;
        } ;
        times[items]=sprintf("%02dh%02dm%02ds", hr, min, sec ) ;
        if( dbg == 1 ){ print "\t"times[items] | "cat >&2" ; } ;
    } ;
}END{
    if( dbg == 1 ){ print "Normalized Values:" } ; 
    for( i=1 ; i <= items ; i++ ){
        print times[i] ;
    } ;
}' "${INPUT}" > "${INPUT}.out"

echo ""
cat "${INPUT}.out"

echo ""
echo "Sorted Values:"
grep -v 'Normalized' "${INPUT}.out" | sort -n

Outputs the header as `00h00m00s`. doesn't print the rest of the record, and kinda big... though if you store it somewhere that doesn't matter. I like the built-in debugging. — Paul Hodges, Dec 14 '22 at 14:48

score 0 · Answer 7 · answered Dec 14 '22 at 17:42

I'd like to thank everyone for their time and contributions.
As expected, I have learned a few things. :)

A coworker saw this and privately sent me a solution VERY similar to this solution from the same page as I referenced in the question, which helped me understand both my problem and a built-in way to solve it. I repost here with some explanation in the hope that someone will find it useful, and maybe help me refine my own understanding.

My main problems with the typical output are that the time field's formatting is (imo) horribly inconsistent - I assume for the sake of brevity, which can be good - and that the sorting is apparently by object instead of time (which also makes sense in many cases) and THEN on the object's .lastTimestamp.

For the record, kubectl get --help lists (among much else)

--sort-by='': If non-empty, sort list types using this field specification.

noting The field specification is expressed as a JSONPath expression from the k8s object definition. and also

-o, --output='': Output format.

which has a long list of options including -o custom-columns=... which lets you "roll your own".

Accordingly, I rebuilt the normal output replacing the offending first column with an actual, consistent timestamp field, and change the default sort order.

kubectl get events -o custom-columns="TIMESTAMP:{.lastTimestamp},REASON:{.reason},TYPE:{.type},OBJ_NAME:{.involvedObject.name},MESSAGE:{.message}" --sort-by={.lastTimestamp,.type,.reason}

Even better, the help text references the relevant documentation directly, so I was quickly able to convert this to use a template file:

$: cat $HOME/.kube/custCol.txt
TIMESTAMP      REASON  TYPE  OBJ_NAME             MESSAGE
lastTimestamp  reason  type  involvedObject.name  message

$: kubectl get events -o custom-columns-file=$HOME/.kube/custCol.txt --sort-by={.metadata.creationTimestamp,.type,.reason}

Still long, but doesn't need to be piped to another process where I could fumble-finger the logic. To make it concise and easier to type and read, I made an alias -

alias events='kubectl get events -o custom-columns-file=$HOME/.kube/custCol.txt --sort-by={.metadata.creationTimestamp,.type,.reason}

Now I can specify cluster and namespace by adding them at the end -

$: events --context bulk -n bulk-sit1 | head -6
TIMESTAMP             REASON                 TYPE      OBJ_NAME                                     MESSAGE
2022-12-14T16:15:18Z   BackOff                Warning   seasonalsuspends-cron-job-1671034500-7c7md   Back-off restarting failed container
2022-12-14T16:00:29Z   BackOff                Warning   nonpays-cron-job-1671033600-7xhl7            Back-off restarting failed container
2022-12-14T16:06:51Z   BackoffLimitExceeded   Warning   nonpays-cron-job-1671033600                  Job has reached the specified backoff limit
2022-12-14T16:22:07Z   BackoffLimitExceeded   Warning   seasonalsuspends-cron-job-1671034500         Job has reached the specified backoff limit
2022-12-14T15:45:20Z   Completed              Normal    nonpays-cron-job-1671032700                  Job completed

Have yet to figure out built-in filters, so for now I have a function with grep for just warnings -

warnings() { local args=(get events --sort-by={.lastTimestamp,.type,.reason} "$@" 
   -o custom-columns=TIMESTAMP:{.lastTimestamp},REASON:{.reason},TYPE:{.type},OBJ_NAME:{.involvedObject.name},MESSAGE:{.message});
  kubectl "${args[@]}" | grep ' Warning '
}

Hope someone gets some use out of that.

sorting by inconsistently formatted elapsed time field ( k8s events by actual time since event )

edited

7 Answers7