77

I've often wanted to sort strings with numbers in them so that, when sorting e.g. abc_2, abc_1, abc_10 the result is abc_1, abc_2, abc_10. Every sort mechanism I've seen sorts as abc_1, abc_10, abc_2, that is character by character from the left.

Is there any efficient way to sort to get the result I want? The idea of looking at every character, determining if it's a numeral, building a substring out of subsequent numerals and sorting on that as a number is too appalling to contemplate in bash.

Has no bearded *nix guru implemented an alternative version of sort with a --sensible_numerical option?

jww
  • 97,681
  • 90
  • 411
  • 885
hardcode57
  • 1,497
  • 2
  • 11
  • 10

3 Answers3

140

Execute this

sort -t _ -k 2 -g data.file
  • -t separator
  • -k key/column
  • -g general numeric sort
tripleee
  • 175,061
  • 34
  • 275
  • 318
Grzegorz Żur
  • 47,257
  • 14
  • 109
  • 105
54

I think this is a GNU extension to sort, but you're looking for the --version-sort (or -V) option:

$ printf "prefix%d\n" $(seq 10 -3 1)
prefix10
prefix7
prefix4
prefix1

$ printf "prefix%d\n" $(seq 10 -3 1) | sort
prefix1
prefix10
prefix4
prefix7

$ printf "prefix%d\n" $(seq 10 -3 1) | sort --version-sort
prefix1
prefix4
prefix7
prefix10

https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
45

You can sort using version-sort
Just pass the following arg -V or --version-sort

# without (ersion-sort)
$ cat a.txt
abc_1
abc_4
abc_2
abc_10
abc_5

# with (version-sort)
$ sort -V a.txt
abc_1
abc_2
abc_4
abc_5
abc_10
Ahmed Nabil
  • 17,392
  • 11
  • 61
  • 88
Bill
  • 5,263
  • 6
  • 35
  • 50