0

Sometimes it might be required to sort data. Unfortunately, gnuplot (as far as I know) doesn't offer this possibility. Of course, you can use external tools like awk, Perl, Python, etc. However, for maximum platform independence and avoiding the installation of additional programs and related complications, and also for curiosity, I was interested whether gnuplot can sort somehow nevertheless. I will be grateful for comments on improvements, limitations.

Does anybody have ideas how to sort alphanumerical data with gnuplot only?

### Sorting with gnuplot
reset session

# generate some random example data
N = 10
set samples N
RandomNo(n) = sprintf("%.02f",rand(0)*n)
set table $Data
    plot '+' u (RandomNo(10)):(RandomNo(10)):(RandomNo(10)) w table
unset table
print $Data

# Settings for sorting
ColNo = 2   # ColumnNo for sorting
stats $Data nooutput      # get the number of rows if data is from file
RowCount = STATS_records  # with the example data above, of course RowCount=N

# create the sortkey and put it into an array
array SortKey[RowCount]
set table $Dummy
    plot $Data u (SortKey[$0+1] = sprintf("%.06f%02d",column(ColNo),$0+1)) w table
unset table
# print $Dummy

# get lines as whole into array
set datafile separator "\n"
array DataSeq[RowCount]
set table $Dummy2
    plot $Data u (SortKey[$0+1]):(DataSeq[$0+1] = stringcolumn(1)) with table
unset table
print $Dummy2
set datafile separator whitespace

# do the actual sorting with 'smooth unique'
set table $Dummy3
    plot $Dummy2 u 1:0 smooth unique
unset table
# print $Dummy3

# extract the sorted sortkeys
set table $Dummy4
    plot $Dummy3 u (SortKey[$0+1]=$2) with table
unset table
# print $Dummy4

# create the table with sorted lines
set table $DataSorted
    plot $Data u (DataSeq[SortKey[$0+1]+1]) with table
unset table
print $DataSorted
### end of code
  • First datablock unsorted data
  • second datablock intermediate with sortkeys
  • third datablock sorted data by the second column

Output:

 5.24    6.68    3.09   
 1.64    1.27    9.82   
 6.44    9.23    7.03   
 8.14    8.87    3.82   
 4.27    5.98    0.93   
 7.96    3.64    6.15   
 6.21    6.28    6.17   
 1.52    3.17    3.58   
 4.24    2.16    8.99   
 8.73    6.54    1.13   

 6.68000001      5.24    6.68    3.09
 1.27000002      1.64    1.27    9.82
 9.23000003      6.44    9.23    7.03
 8.87000004      8.14    8.87    3.82
 5.98000005      4.27    5.98    0.93
 3.64000006      7.96    3.64    6.15
 6.28000007      6.21    6.28    6.17
 3.17000008      1.52    3.17    3.58
 2.16000009      4.24    2.16    8.99
 6.54000010      8.73    6.54    1.13

 1.64    1.27    9.82
 4.24    2.16    8.99
 1.52    3.17    3.58
 7.96    3.64    6.15
 4.27    5.98    0.93
 6.21    6.28    6.17
 8.73    6.54    1.13
 5.24    6.68    3.09
 8.14    8.87    3.82
 6.44    9.23    7.03 
theozh
  • 22,244
  • 5
  • 28
  • 72
  • "awk, perl, python, etc" sounds overly complicated for sorting. The 'sort' utility was designed for this For alphanumeric sort on column 2: "plot ' – Ethan Jan 12 '19 at 04:42
  • unfortunately, I am on Windows. There is a `sort` but I don't think it does the sorting we want. Then I am back to awk, Perl, Python,.... or changing to Linux ;-) – theozh Jan 12 '19 at 05:08
  • 1
    You can download `sort` (or all supported utilities) from GNUWin32 project page: http://gnuwin32.sourceforge.net/ . `sort` in included in `CoreUtlis` package: http://gnuwin32.sourceforge.net/packages/coreutils.htm – Michael Jan 12 '19 at 11:19
  • @Michael O., thank you for this hint, I didn't know about these `CoreUtils` for Windows, I thought `sort`, etc. are Linux built-in utilities. First of all, error messages occured: `libintl3.dll` missing and later `libiconv2.dll` missing. After downloading them separately it seems to finally work. Now, I am trying to understand the options and documentation. – theozh Jan 12 '19 at 20:03
  • Glad to know that it works. As I assume (maybe incorrectly), the options should be the same as these in GNU coreutils, i.e. in a regular Linux distribution. – Michael Jan 12 '19 at 20:08
  • Well, exactly because of this uncertainty and possible cross-platform incompatibilites which might (or might not) popup at some point in time in some (rare) special cases, I would prefer a gnuplot native solution. But the risk for `sort` is probably rather low, I hope. By the way, how would I do it for MacOS? – theozh Jan 13 '19 at 08:02
  • Never worked with this system, don't know. – Michael Jan 13 '19 at 09:37

1 Answers1

1

For curiosity, I wanted to know whether an alphanumerical sort could be implemented with gnuplot code only. This avoids the need for external tools and ensures maximum platform compatibility. I haven't heard yet about an external tool which could assist gnuplot and which works under Windows and Linux and MacOS. I am happy to take comments and suggestions about bugs, simplifications, improvements, performance comparisons, and limits.

For alphanumerical sort, the first stage is alphanumerical string comparison, which to my knowledge does not exist in gnuplot directly. So, the first part Compare.plt is about comparison of strings.

### compare function for strings 
# Compare.plt
# function cmp(a,b,cs) returns a<b:-1, a==b:0, a>b:+1
# cs=0: case-insensitive, cs=1: case-sensitive
reset session

ASCII =  ' !"' . "#$%&'()*+,-./0123456789:;<=>?@".\
         "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\`".\
         "abcdefghijklmnopqrstuvwxyz{|}~"

ord(c) = strstrt(ASCII,c)>0 ? strstrt(ASCII,c)+31 : 0

# comparing char: case-sensitive
cmpcharcs(c1,c2) = sgn(ord(c1)-ord(c2))

# comparing char: case-insentitive
cmpcharci(c1,c2) = sgn(( cmpcharci_o1=ord(c1), ((cmpcharci_o1>96) && (cmpcharci_o1<123)) ?\
    cmpcharci_o1-32 : cmpcharci_o1) - \
    ( cmpcharci_o2=ord(c2), ((cmpcharci_o2>96) && (cmpcharci_o2<123)) ?\
    cmpcharci_o2-32 : cmpcharci_o2) )

# function cmp returns a<b:-1, a==b:0, a>b:+1
# cs=0: case-insensitive, cs=1: case-sensitive
cmp(a,b,cs) = ((cmp_r=0, cmp_flag=0, cmp_maxlen=strlen(a)>strlen(b) ? strlen(a) : strlen(b)),\
    (sum[cmp_i=1:cmp_maxlen] \
      ((cmp_flag==0 && (cmp_c1 = substr(a,cmp_i,cmp_i), cmp_c2 = substr(b,cmp_i,cmp_i), \
        (cmp_r = (cs==0 ?  cmpcharci(cmp_c1,cmp_c2) : cmpcharcs(cmp_c1,cmp_c2) ) )!=0 ? \
        (cmp_flag=1, cmp_r) : 0)), 1 )), cmp_r)

cmpsymb(a,b,cs) = (cmpsymb_r = cmp(a,b,cs))<0 ? "<" : cmpsymb_r>0 ? ">" : "="
### end of code

Example:

### example compare strings
load "Compare.plt"

a="Alligator"
b="Tiger"
print sprintf("% 2d: % 9s% 2s% 6s", cmp(a,b,0), a, cmpsymb(a,b,0), b)

a="Tiger"
print sprintf("% 2d: % 9s% 2s% 6s", cmp(a,b,0), a, cmpsymb(a,b,0), b)

a="Zebra"
print sprintf("% 2d: % 9s% 2s% 6s", cmp(a,b,0), a, cmpsymb(a,b,0), b)
### end of code

Result:

-1: Alligator < Tiger
 0:     Tiger = Tiger
 1:     Zebra > Tiger

The second part makes use of the comparison for sorting.

### alpha-numerical sort with gnuplot
reset session
load "Compare.plt"

$Data <<EOD
1   0.123   Orange
2   0.456   Apple
3   0.789   Peach
4   0.987   Pineapple
5   0.654   Banana
6   0.321   Raspberry
7   0.111   Lemon
EOD

stats $Data u 0 nooutput
RowCount = STATS_records
ColSort = 3

array Key[RowCount]
array Index[RowCount]

set table $Dummy
    plot $Data u (Key[$0+1]=stringcolumn(ColSort),Index[$0+1]=$0+1) w table
unset table

# Bubblesort
do for [n=RowCount:2:-1] {
    do for [i=1:n-1] {
        if ( cmp(Key[i],Key[i+1],0) > 0) { 
            tmp=Key[i]; Key[i]=Key[i+1]; Key[i+1]=tmp
            tmp2=Index[i]; Index[i]=Index[i+1]; Index[i+1]=tmp2
        }
    }
}

set datafile separator "\n"
set table $Dummy    # and reuse Key-array
    plot $Data u (Key[$0+1]=stringcolumn(1)) with table
unset table
set datafile separator whitespace

set table $DataSorted
    plot $Data u (Key[Index[$0+1]]) with table
unset table

print $DataSorted
set grid xtics,ytics
plot [-0.5:RowCount-0.5][0:1.1] $DataSorted u 0:2:xtic(3) w lp lt 7 lc rgb "red"
### end of code

Input:

1   0.123   Orange
2   0.456   Apple
3   0.789   Peach
4   0.987   Pineapple
5   0.654   Banana
6   0.321   Raspberry
7   0.111   Lemon

Output:

 2      0.456   Apple   
 5      0.654   Banana  
 7      0.111   Lemon   
 1      0.123   Orange  
 3      0.789   Peach   
 4      0.987   Pineapple       
 6      0.321   Raspberry  

and the output graph:

enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72