3

this is an example of my data

ip=1.2.3.4, setup_time=05:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 16 2021, foo=moshe2, bar=haim2

i would like to be able to sort by the setup_time column in bash. I know that I can't use sort because sort allow only sort by string matching and this is not a format of YYYY-MM-DD HH:mm:ss so string sorting is not possible.

so any ideas would be greatly appreciated.

thank you

#update

ok to better understand what i'm trying to achieve i created the folowing file named 1:

ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2

so I executed this:

cat 1 | sed 's/, /!/g' | sort -t '!' -k2,2

what i did here is replaced , with ! so i can use a delimiter in sort, the problem is that sort is doing string sorting and not timestamp kind of sorting so the output is:

ip=2.3.4.5!setup_time=05:59:30.260 GMT Tue Mar 17 2021!foo=moshe2!bar=haim2
ip=2.3.4.5!setup_time=06:50:30.260 GMT Tue Mar 18 2021!foo=moshe2!bar=haim2
ip=1.2.3.4!setup_time=06:58:38.617 GMT Tue Mar 16 2021!foo=moshe!bar=haim
ufk
  • 30,912
  • 70
  • 235
  • 386
  • 1
    I suggest to use a [Schwartzian Transform](https://en.wikipedia.org/wiki/Schwartzian_transform). – Cyrus May 19 '21 at 15:59
  • @Cyrus - added info, so in general i should use a programming language for that? it can't be done in pure bash ? – ufk May 19 '21 at 16:08
  • Please add your desired output (no description, no images, no links) for that sample input to your question (no comment). – Cyrus May 19 '21 at 16:43

3 Answers3

8

Sort is able to deal with month names, thanks to the option M

No need to change , into !. Use the white space as delimiter and just issue:

LC_ALL=en sort -k7nr -k5Mr -k6nr -k2r sample

If you use this as content of the file sample:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2

you will get this as output:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2

Specifying -k7 means to sort on the seventh field. The r option reverses the order of sorting to descending. The M option sorts according the name of the month. The n option sorts numerically. To sort on the time, just consider the whole second field (beginning with the string setup_time=) as a fixed length string using -k2.

LC_ALL=en in the begin of the command line tells the system to use the English names of the months.

Pierre François
  • 5,850
  • 1
  • 17
  • 38
0

A solution involving awk:

awk '
{
    year = substr($7, 1, length($7)-1)
    cmd ="date --date=\""$3" "$4" "$5" "$6" "$year"\" +%s"
    cmd | getline var
    print var, $0
    close(cmd)
}' file | sort -k 1 | cut -f 1- -d' '

The trick is that date --date="GMT Tue Mar 18 2021" will parse the date heuristically (meaning it will also work with gdate --date="GMT Tue 18 Mar 2021"), and then you can print only the seconds since epoch.

awk will output the seconds as first column, you sort by it, then you remove the first column from the result.

Biggest advantage of this solution is that it will work for other types of date formats (within reason of course).

Note1: for this to work you need GNU date (on Mac OS gdate, for example)

Note2: instead of awk you could use also bash with while/read (as in Read a file line by line assigning the value to a variable), but awk is rather standard, so not sure if it is a big difference for you.

vladmihaisima
  • 2,119
  • 16
  • 20
0

If you have a sort with month name support -- use that. Pierre's solution is elegant!

If you don't, convert the date to ISO 8601 (which sorts lexicographically) and use a Schwartzian transform or a Decorate / Sort / Undecorate pattern.

The easiest, since the date you have is non standard, is use Perl to decorate, sort to sort on the first field, then cut to undecorate (remove the added field):

perl -lnE '
BEGIN{ 
%m2n = qw(Jan 01  Feb 02  Mar 03  Apr 04  May 05  Jun 06
          Jul 07  Aug 08  Sep 09  Oct 10 Nov 11 Dec 12
);}

m/setup_time=([\d:]+).*?(\w\w\w) (\d\d?) (\d\d\d\d),/;
$mon=$m2n{$2};
say "$4$mon$3$1\t$_"' YourFile | sort -t $'\t' -r -k1,1 | cut -d $'\t' -f2-

Using pierre's data, prints:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2
dawg
  • 98,345
  • 23
  • 131
  • 206