151

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

Chris Kloberdanz
  • 4,436
  • 4
  • 30
  • 31
  • One or two lines of example data would be really helpful for to create example command line. Also, does "1-n" keys mean that you need to sort by a variable number of keys? Doing that without scripting is gonna be fun... – Ken Gentle Dec 10 '08 at 20:58
  • I have a PHP wrapper around the sort command to enable the 1-n feature. – Chris Kloberdanz Dec 10 '08 at 21:28

7 Answers7

352

Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)
ndemou
  • 4,691
  • 2
  • 30
  • 33
andras
  • 6,339
  • 6
  • 26
  • 22
  • 1
    Nice! Now, what if I want fleld 3 to be numerically and reverse sorted whereas field 2 to be non-numerically and normal (ascending) sorted? :) – Arun Feb 16 '17 at 13:46
  • 5
    @Arun POS is explained at the end of the man page. You just append the ordering options to the field number like this: `sort -k 3,3nr -k 2,2` – andras Aug 04 '17 at 15:03
  • 5
    Aargh. What a counterintuitive interface: `-k2` should be `-k2,2` and a trailing comma `-k2,` should be 'magical default end of line or whatever'. – android.weasel Nov 21 '17 at 08:35
  • why the angle bracket `<`? Should `sort -k3,3 -k2,2 inputfile` not do the job? – HongboZhu Sep 18 '20 at 13:00
103

The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.

Andy
  • 17,423
  • 9
  • 52
  • 69
Clinton Pierce
  • 12,859
  • 15
  • 62
  • 90
  • It is only one field if there are no blanks in the input data. Nevertheless, your example is useful. – Jonathan Leffler Dec 11 '08 at 06:05
  • Correction: if there are no /tabs/ in the input data. In DOS's 'dir' command output, there are no tabs. – Clinton Pierce Dec 11 '08 at 16:24
  • 2
    The examples on how to use the options (numeric, reverse) are extremely helpful, as it's nearly impossible to find out how to use just from the man page and the other answers didn't mention it. I wish I could +2 for this. ;) – msb Oct 21 '13 at 18:46
73

Use the -k option (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as n for numeric sort)

Ken Gentle
  • 13,277
  • 2
  • 41
  • 49
24

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga
Patryk
  • 22,602
  • 44
  • 128
  • 244
JayS
  • 2,057
  • 24
  • 16
  • 3
    This is the best answer because it shows how to use different switches for different columns – xaxa Jan 13 '16 at 15:58
12

I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.

Dong Hoon
  • 879
  • 1
  • 8
  • 13
  • Even though the default separator accordinding to docs https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html is space, sometimes the field count is not what you'd expect. Perhaps as others have said here because of the LC_CTYPE locale setting. When in doubt count from the beginning of the line! – Brad Dre Nov 12 '19 at 19:10
8

Note that is may also be desired to stabilize the sort with the -s switch, so that equally ranked lines maintain their original relative order in the output too.

ron
  • 9,262
  • 4
  • 40
  • 73
2

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.

jianpx
  • 3,190
  • 1
  • 30
  • 26