1

I have a script that outputs file paths (via find), which I want to sort based on very specific custom logic:

  • 1st sort key: I want the 2nd and, if present, the 3rd --separated field to be sorted using custom ordering based on a list of keys I supply - but excluding a numerical suffix.
    With the sample input below, the list of keys is:
    rp,alpha,beta-ri,beta-rs,RC

  • 2nd sort key: numeric sorting by the trailing number on each line.

Given the following sample input (note that the /foo/bar/test/example/8.2.4.0 prefix of each line is incidental):

/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-rp2

I expect:

/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
mklement0
  • 382,024
  • 64
  • 607
  • 775
Daniel
  • 584
  • 3
  • 8
  • 20

3 Answers3

0

Using a variant of my answer to your original question:

./your-script | awk -v keysInOrder='rp,alpha,beta-ri,beta-rs,RC' '
    BEGIN {
      FS=OFS="-"
      keyCount = split(keysInOrder, a, ",")
      for (i = 1; i <= keyCount; ++i) keysToOrdinal[a[i]] = i
    }
    { 
      sortKey = $2
      if (NF == 3) sortKey = sortKey FS $3
      sub(/[0-9]+$/, "", sortKey)
      auxFieldPrefix = "|" FS
      if (NF == 2) auxFieldPrefix = auxFieldPrefix FS
      sub(/[0-9]/, auxFieldPrefix "&", $NF)
      sortOrdinal = sortKey in keysToOrdinal ? keysToOrdinal[sortKey] : keyCount + 1
      print sortOrdinal, $0
    }
'  | sort -t- -k1,1n -k3,3 -k5,5n | sed 's/^[^-]*-//; s/|-\{1,2\}//'

./your-script represents whatever command produces the output you want to sort.

Note that an aux. character, |, is used to facilitate sorting, and the assumption is that this character doesn't appear in the input - which should be reasonable safe, given that filesystem paths usually don't contain pipe characters.

Any field 2 values (sans numeric suffix) that aren't in the list of sort keys, sort after the field 2/3 values that are, using alphabetic sorting among them.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
0

While this does not match what the OP is looking for, it would be useful to point out that sort command has an option -V for version sorting. And it does the job by following correct order of characters in ASCII table (i.e. UPPERCASE letters first, lowercase letters next)

For example:

cat test.sort.txt 
/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-rp2

And sorting:

 % sort -V test.sort.txt              
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10

So, it is useful to be aware of this when giving version names.

With that said, if you insisted, this is one liner that use sed to enforce sorting:

cat test.sort.txt|sed -e 's/-rp/-x1xrp/;s/-alpha/-x2xalpha/;s/-beta-ri/-x3xbeta-ri/;s/-beta-rs/-x4xbeta-rs/;s/-RC/-x5xRC/'|sort -V|sed -e 's/x.x//'
/foo/bar/test/example/8.2.4.0-rp2
/foo/bar/test/example/8.2.4.0-rp10
/foo/bar/test/example/8.2.4.0-alpha2
/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-ri10
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs10
/foo/bar/test/example/8.2.4.0-RC1
/foo/bar/test/example/8.2.4.0-RC2
/foo/bar/test/example/8.2.4.0-RC10
biocyberman
  • 5,675
  • 8
  • 38
  • 50
-1

I found out a solution totally different of what @mklement0 suggests me.

#!/bin/bash

echo "Enter a version :"
read VERSION

while read line; 
do

  find $line -type d | grep $VERSION | sort -n >> outfile.txt

  grep '.*-alpha[0-9]' outfile.txt | sort -n >> outfile2.txt 
  grep '.*-beta-ri[0-9]' outfile.txt | sort -n >> outfile2.txt 
  grep '.*-beta-rs[0-9]' outfile.txt | sort -n >> outfile2.txt 
  grep '.*-RC[0-9]' outfile.txt | sort -n >> outfile2.txt   
  rm outfile.txt 

done <whatever.txt

Content of outfile2.txt :

/foo/bar/test/example/8.2.4.0-alpha10
/foo/bar/test/example/8.2.4.0-alpha8
/foo/bar/test/example/8.2.4.0-alpha9
/foo/bar/test/example/8.2.4.0-beta-ri1
/foo/bar/test/example/8.2.4.0-beta-ri2
/foo/bar/test/example/8.2.4.0-beta-rs1
/foo/bar/test/example/8.2.4.0-beta-rs2
/foo/bar/test/example/8.2.4.0-beta-rs3
/foo/bar/test/example/8.2.4.0-RC1

The only thing wrong with this is that alpha10 came before alpha8

Any clue ?

Daniel
  • 584
  • 3
  • 8
  • 20
  • You won't get correct numerical sorting if your sort keys start with non-numbers (try `sort -n <<<$'abc10\nabc2'`). Aside from that, your solution is quite inefficient. – mklement0 Jan 25 '17 at 17:58
  • Your sort doesnt work. Can you explain what does `<<<$abc10\nabc2` mean ? – Daniel Jan 26 '17 at 09:13
  • 1
    Which sort doesn't work? `<<<` is a [here-string](http://mywiki.wooledge.org/HereDocument?action=show&redirect=HereString) that sends its argument via _stdin_ (a single-line variation of a here-_doc_); `$'...'` is an [ANSI C-quoted string](http://www.gnu.org/software/bash/manual/bash.html#ANSI_002dC-Quoting), in which control-character escapes such as `\n` are expanded. Not all shells support these constructs, however; here's the POSIX-compliant equivalent: `printf 'abc10\nabc2\n' | sort -n` - as you'll see, `10` sorts before `2`, i.e., the sorting is not numerical. – mklement0 Jan 26 '17 at 13:13
  • 1
    Re `sort -n <<<$'abc10\nabc2'`: I think I now understand where your confusion was: My intent was to use a simple example to demonstrate why your approach does _not_ work, not an attempt to show you a fix. Making your approach work - which I recommend against, due to its inefficiency - would be nontrivial, because of the varying number of fields in your input, which prevents you from using a simple field-_index_-based `sort` command. My answer already handles all these intricacies. – mklement0 Jan 26 '17 at 16:19