180

I have an array in Bash, for example:

array=(a c b f 3 5)

I need to sort the array. Not just displaying the content in a sorted way, but to get a new array with the sorted elements. The new sorted array can be a completely new one or the old one.

codeforester
  • 39,467
  • 16
  • 112
  • 140
user32004
  • 2,203
  • 3
  • 17
  • 17

20 Answers20

293

You don't really need all that much code:

IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS

Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.

e.g.:

$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[@]}"
[3 5]
[a c]
[b]
[f]

Note: @sorontar has pointed out that care is required if elements contain wildcards such as * or ?:

The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

What's happening:

The result is a culmination six things that happen in this order:

  1. IFS=$'\n'
  2. "${array[*]}"
  3. <<<
  4. sort
  5. sorted=($(...))
  6. unset IFS

First, the IFS=$'\n'

This is an important part of our operation that affects the outcome of 2 and 5 in the following way:

Given:

  • "${array[*]}" expands to every element delimited by the first character of IFS
  • sorted=() creates elements by splitting on every character of IFS

IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)

Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.

The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.

Next, the sort <<<"${array[*]}" part

<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.

With our example, sort is fed this following string:

a c
b
f
3 5

Since sort sorts, it produces:

3 5
a c
b
f

Next, the sorted=($(...)) part

The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.

In our example, this produces something similar to simply writing:

sorted=(3 5
a c
b
f
)

sorted then becomes an array that's created by splitting this literal on every new line.

Finally, the unset IFS

This resets the value of IFS to the default value, and is just good practice.

It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)

antak
  • 19,481
  • 9
  • 72
  • 80
  • very nice. Why the IFS tho? I dont think this is necessary – Frido Emans Oct 19 '15 at 08:32
  • 4
    @xxor without the `IFS`, it'll split your elements into little pieces if they have whitespaces in them. Try the *e.g.* with `IFS=$'\n' ` omitted and see! – antak Oct 19 '15 at 09:04
  • ahh, good solution indeed. I only tried on my own sort without whitespaces in it. now i see – Frido Emans Oct 19 '15 at 11:17
  • 3
    Very nice. Could you explain for the average bash user how this solution works? – user32004 Oct 21 '15 at 07:46
  • 2
    Now, with the `IFS`, it splits your elements into little pieces if they have only one particular kind of whitespace in it. Good; not perfect :-) – lmat - Reinstate Monica Jan 29 '16 at 14:56
  • 16
    Is `unset IFS` necessary? I thought prepending `IFS=` to a command scoped the change to that command only, returning to its previous value automatically afterwards. – Mark H Sep 24 '16 at 11:06
  • 18
    @MarkH It's necessary because `sorted=()` is not a command but rather a second variable assignment. – antak Sep 26 '16 at 04:01
  • I'd suggest adding that to the sample code, rather than as an English-language comment someone may or may not read or adopt for their own implementation. (Actually didn't see that provisio in the answer at all, and almost duplicated sorontar's comment before seeing it had already been made). – Charles Duffy Nov 29 '16 at 05:52
  • 3
    BTW, if we're on a GNU system (or somewhere else with `sort -z`), it'd be safer to NUL-delimit the entries -- that way we're not splitting entries with newline literals into multiple entries in the newly created array. `out=( ); while IFS= read -r -d '' entry; do out+=( "$entry" ); done < <(printf '%s\0' "${array[@]}" | sort -z)` – Charles Duffy Nov 29 '16 at 05:53
  • 1
    I wonder why `"${array[@]}"` does not work as here string? It is supposed to handle spaces in elements better than `"${array[*]}"` like it does in your example in the `printf` command. `array=("a c" b f "3 5"); IFS=$'\n' printf '[%s]\n' "${array[*]}"; unset -v IFS` does not work, even when IFS is set. – jarno Dec 12 '16 at 05:57
  • 1
    @jarno: It's explained in the answer, but just to summarize it: `"${array[*]}"` expands to a _single string_ formed by joining the array elements with the _1st char. of `$IFS`_. By contrast, `"${array[@]}"` expands to _multiple arguments_, which, when used in a string context, are _always_ joined with a _space_, irrespective of the value of `$IFS`. – mklement0 Feb 15 '17 at 20:33
  • 1
    Cute answer, but you're already showing `printf`s use, which can be used for an easier solution that doesn't depend on IFS for generating output: `IFS=$'\n' sorted=($(printf '%s\n' "${array[@]}" | sort))`. – Eli Barzilay Jun 16 '17 at 23:38
  • @EliBarzilay I don't get what's easier. What you wrote is longer than what's in the answer and sets `IFS` anyhow. – antak Jan 29 '18 at 01:06
  • @antak, Sorry, I *think* that I actually meant to say that it doesn't depend on `<<<"..."` and how that interacts with `IFS`. – Eli Barzilay Feb 02 '18 at 20:22
  • 1
    Instead of performing an `unset IFS`, first at the beginning save `IFS` into something like `OLD_IFS`, and then at the end restore `IFS` to the original value. – user1404316 Mar 22 '18 at 23:02
  • @user1404316 I think you should explain why that's useful. Maybe if you're writing a script that reads `IFS`? Though the question then is why.. Or maybe it's for modularity within the script? In which case `IFS=$OLD_IFS` wouldn't restore the old value if it was previously unset, but instead sets it to blank which can do more damage than good... I think a script implicitly has a certain level of interdependency, so we design and make rules on how our scripts should behave. To this end, using `OLD_IFS` seems no more beneficial than just sticking with `unset IFS`. – antak Mar 23 '18 at 02:20
  • Why not set IFS inside the subshell so that it doesn't need to unset or saved and restored? `sorted=($(IFS=$'\n' ; sort <<<"${array[*]}"))` – haridsv Mar 27 '19 at 06:23
  • @haridsv Good observation. However, our `IFS` is also required by the `sorted=(...)` operation, which is outside the subshell. – antak Mar 31 '19 at 07:04
  • @antak I don't think so, at least not for the sake of `sorted` array. – haridsv Mar 31 '19 at 16:26
  • 1
    @haridsv For example, if you run the example starting `array=("a c" b f "3 5")` but move `IFS=` inside the subshell, the elements `a c` and `3 5` will be broken up into four elements `a`, `c`, `3` and `5`. – antak Apr 01 '19 at 03:09
  • @antak I see what you are saying, thanks for pointing it out! – haridsv Apr 01 '19 at 07:09
  • 1
    Do you even need to mess with IFS? `sort` has a field separator built-in: `sort --field-separator=' ' <<<"${array[*]}"` – peterRepeater Jul 14 '19 at 18:32
  • @peterRepeater See [this comment](https://stackoverflow.com/questions/7442417/how-to-sort-an-array-in-bash/11789688?noredirect=1#comment54225267_11789688), because the same thing will happen with your idea. You should try it by changing the example I posted and see. – antak Jul 16 '19 at 09:07
  • Wouldn't `{ local IFS=$'\n'; sorted=($(sort <<<"${array[*]}")); }` be better because it doesn't presume anything about the prior value of `IFS`? – Kyle Rose Aug 08 '20 at 13:07
  • @KyleRose That gives me `local: can only be used in a function` in my bash 4. – antak Aug 11 '20 at 01:16
  • @antak Huh. I don't get an error, probably because I only tried it within a function, in which case it isn't actually doing what I think. Thanks. – Kyle Rose Aug 12 '20 at 10:41
  • In this case, I would fall back to the `OLDIFS=$IFS` ... `IFS=$OLDIFS` pattern. – Kyle Rose Aug 12 '20 at 10:42
  • While this is a very old thread - worth mentioning is that if you want to avoid `unset IFS` you can wrap the line into eval, e.g. `IFS=$'\n' eval 'sorted=( $(sort <<<"${array[*]}") )'` doesn't need later `unset`, as `eval` serves as a command. – Michal Soltys Nov 29 '22 at 17:03
45

Original response:

array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[@]}"; do echo "$a"; done | sort)

output:

$ for a in "${sorted[@]}"; do echo "$a"; done
3
5
a
b
c
f f

Note this version copes with values that contains special characters or whitespace (except newlines)

Note readarray is supported in bash 4+.


Edit Based on the suggestion by @Dimitre I had updated it to:

readarray -t sorted < <(printf '%s\0' "${array[@]}" | sort -z | xargs -0n1)

which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by @ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 7
    Nice, it should be also noted that readarray is available since version 4 of bash. It could be shortened a bit: `readarray -t sorted < <(printf '%s\n' "${array[@]}" | sort)` – Dimitre Radoulov Sep 16 '11 at 09:35
  • 1
    @Dimitre: I took your suggestion and fixed the whitespace handling to work with anything (using nullchar-delimiters internally). Cheers – sehe Sep 16 '11 at 09:42
  • 1
    Yes the `sort -z` is a useful improvement, I suppose the `-z` option is a GNU sort extention. – Dimitre Radoulov Sep 16 '11 at 09:46
  • 1
    I don't know if I agree that that "understand[s] elements with newline characters embedded". If the filename is `$'z\na'`, then it will successfully keep the `z` and `a` together during sorting, but they'll still end up in separate array-elements, with the `$'\n'` being swallowed. It's unfortunate that there's no way to tell `readarray` to use `$'\0'` instead of `$'\n'`. – ruakh Feb 15 '12 at 14:39
  • @ruakh: Very good spot. Fixed the answer, thanks for the heads up – sehe Feb 15 '12 at 15:49
  • 3
    If you want to handle embedded newlines, you can roll your own readarray. For example: `sorted=(); while read -d $'\0' elem; do sorted[${#sorted[@]}]=$elem; done < <(printf '%s\0' "${array[@]}" | sort -z)`. This also works in you are using bash v3 instead of bash v4, because readarray isn't available in bash v3. – Bob Bell Mar 28 '12 at 18:00
  • Can someone please explain why there are two `< <` ? Why isn't it just: `readarray -t sorted < (printf '%s\0' "${array[@]}" | sort -z | xargs -0n1)`. Thank you. – user1527227 Jul 24 '14 at 15:17
  • 1
    @user1527227 It's input redirection (`<`) combined with _process substitution_ `<(...)`. Or to put it intuitively: because `(printf "bla")` is not a file. – sehe Jul 24 '14 at 18:18
  • 1
    New on bash 4.4 readarray could use null delimiter with `-d ''`. –  Oct 24 '16 at 22:33
  • @sorontar I can't find any documentation to suggest that `-d ''` would use the NUL character as delimiter. Did you find that anywhere? – sehe Oct 25 '16 at 08:01
  • 2
    Start a bash 4.4 and write `help readarray` (mapfile) to find that it accepts a `-d` option to select the `delimiter`. Otherwise, [read here](http://wiki.bash-hackers.org/scripting/bashchanges); `"mapfile" new option "-d" 4.4-alpha`. Also at [the end of this answer](http://stackoverflow.com/a/24426608/6843677). –  Oct 25 '16 at 17:25
  • 1
    And [this is even more specific](https://transnum.blogspot.com/2008/11/bashs-read-built-in-supports-0-as.html) `BASH’s ‘read’ built-in supports '\0' as delimiter`. –  Oct 25 '16 at 19:36
  • @sorontar Thanks for those links. I've tried things, but apparently I'm doing something wrong, or it's not quite working as expected/required for this answer http://i.imgur.com/sMEOWwk.png - Am I missing something? – sehe Oct 25 '16 at 21:29
  • 1
    Please try: `readarray -d '' -t arr < <(printf 'a\0b\0c\0def\0'); printf '%s\n' "${arr[@]}"`. –  Oct 25 '16 at 22:43
  • @sorontar COOL. It's abusing the NUL-terminator for that empty string `''`. Clever. Sounds a bit brittle, but definitely clever, and if even Gentoo ebuilds are relying on the behaviour, let's assume it will be kept :) – sehe Oct 25 '16 at 23:08
  • This algorithm doesn't work at all times. If you give as input numbers "21 1 4 2 18" it sorts them using the 1st digit only and not all digits. Hence the result is "1 18 2 21 4" – HelloIT Sep 08 '19 at 14:05
  • @HelloIT so it works. It just doesn't sort numbers. That's expected and by design. Luckily, you can just clue `sort` in: `sort -n` – sehe Sep 08 '19 at 14:23
39

If you don't need to handle special shell characters in the array elements:

array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[@]}"|sort))

With bash you'll need an external sorting program anyway.

With zsh no external programs are needed and special shell characters are easily handled:

% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[@]}" 
3
5
a a
b
c
f

ksh has set -s to sort ASCIIbetically.

Dimitre Radoulov
  • 27,252
  • 4
  • 40
  • 48
  • Very nice background info. I would almost ask for a demo on how ksh would use the set -s flag... but then again, the question is on bash, so that would be rather off-topic – sehe Sep 16 '11 at 10:01
  • This should work with most *KornShell* implementations (for example *ksh88* and *pdksh*): `set -A array x 'a a' d; set -s -- "${array[@]}"; set -A sorted "$@"` And, of course, the set command will reset the current positional parameters, if any. – Dimitre Radoulov Sep 16 '11 at 10:07
  • You are a veritable fountain of shell knowledge. I'm sure you must have photographics memory or something, because this kind of subtle differences elude most of the other members of the human species :), +1 for the complete package of info – sehe Sep 16 '11 at 10:08
38

Here's a pure Bash quicksort implementation:

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
   local pivot i smaller=() larger=()
   qsort_ret=()
   (($#==0)) && return 0
   pivot=$1
   shift
   for i; do
      # This sorts strings lexicographically.
      if [[ $i < $pivot ]]; then
         smaller+=( "$i" )
      else
         larger+=( "$i" )
      fi
   done
   qsort "${smaller[@]}"
   smaller=( "${qsort_ret[@]}" )
   qsort "${larger[@]}"
   larger=( "${qsort_ret[@]}" )
   qsort_ret=( "${smaller[@]}" "$pivot" "${larger[@]}" )
}

Use as, e.g.,

$ array=(a c b f 3 5)
$ qsort "${array[@]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'

This implementation is recursive… so here's an iterative quicksort:

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
   (($#==0)) && return 0
   local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
   qsort_ret=("$@")
   while ((${#stack[@]})); do
      beg=${stack[0]}
      end=${stack[1]}
      stack=( "${stack[@]:2}" )
      smaller=() larger=()
      pivot=${qsort_ret[beg]}
      for ((i=beg+1;i<=end;++i)); do
         if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
            smaller+=( "${qsort_ret[i]}" )
         else
            larger+=( "${qsort_ret[i]}" )
         fi
      done
      qsort_ret=( "${qsort_ret[@]:0:beg}" "${smaller[@]}" "$pivot" "${larger[@]}" "${qsort_ret[@]:end+1}" )
      if ((${#smaller[@]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[@]}-1))" ); fi
      if ((${#larger[@]}>=2)); then stack+=( "$((end-${#larger[@]}+1))" "$end" ); fi
   done
}

In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
   (($#<=1)) && return 0
   local compare_fun=$1
   shift
   local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
   qsort_ret=("$@")
   while ((${#stack[@]})); do
      beg=${stack[0]}
      end=${stack[1]}
      stack=( "${stack[@]:2}" )
      smaller=() larger=()
      pivot=${qsort_ret[beg]}
      for ((i=beg+1;i<=end;++i)); do
         if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
            smaller+=( "${qsort_ret[i]}" )
         else
            larger+=( "${qsort_ret[i]}" )
         fi
      done
      qsort_ret=( "${qsort_ret[@]:0:beg}" "${smaller[@]}" "$pivot" "${larger[@]}" "${qsort_ret[@]:end+1}" )
      if ((${#smaller[@]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[@]}-1))" ); fi
      if ((${#larger[@]}>=2)); then stack+=( "$((end-${#larger[@]}+1))" "$end" ); fi
   done
}

Then you can have this comparison function:

compare_mtime() { [[ $1 -nt $2 ]]; }

and use:

$ qsort compare_mtime *
$ declare -p qsort_ret

to have the files in current folder sorted by modification time (newest first).

NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).

NOTE2. The test [[ $i < $pivot ]] is correct. It uses the lexicographical string comparison. If your array only contains integers and you want to sort numerically, use ((i < pivot)) instead. Please don't edit this answer to change that. It has already been edited (and rolled back) a couple of times. The test I gave here is correct and corresponds to the output given in the example: the example uses both strings and numbers, and the purpose is to sort it in lexicographical order. Using ((i < pivot)) in this case is wrong.

gniourf_gniourf
  • 44,650
  • 9
  • 93
  • 104
  • 1
    Kudos for impressive Bashing that offers great flexibility with respect to input elements and sort criteria. If line-based sorting with the sort options that `sort` offers is sufficient, a `sort` + `read -a` solution will be faster starting at around, say, 20 items, and increasingly and significantly faster the more elements you're dealing with. E.g., on my late-2012 iMac running OSX 10.11.1 with a Fusion Drive: 100-element array: ca. 0.03s secs. (`qsort()`) vs. ca. 0.005 secs. (`sort` + `read -a`); 1000-element array: ca. 0.375 secs. (`qsort()`) vs. ca. 0.014 secs (`sort` + `read -a`). – mklement0 Oct 27 '15 at 02:50
  • Nice. I remember quick sort from college days but will also research bubble sort. For my sorting needs I have first and second elements forming key followed by one data element (which I may expand later). Your code could be improved with number of key elements (parm1) and number of data elements (parm2). For OP the parameters would be 1 and 0. For me the parameters would be 2 and 1. In any respect your answer has most promise. – WinEunuuchs2Unix Apr 23 '17 at 22:54
  • 1
    With a dataset of uncast string integers I found `if [ "$i" -lt "$pivot" ]; then` was required otherwise the resolved "2" < "10" returned true. I believe this to be POSIX vs. Lexicographical; or perhaps [Inline Link](https://stackoverflow.com/a/11989609/1452451). – Page2PagePro Apr 16 '18 at 21:07
20

tl;dr:

Sort array a_in and store the result in a_out (elements must not have embedded newlines[1] ):

Bash v4+:

readarray -t a_out < <(printf '%s\n' "${a_in[@]}" | sort)

Bash v3:

IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[@]}" | sort)

Advantages over antak's solution:

  • You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).

  • You needn't worry about resetting IFS with unset IFS.[2]


Optional reading: explanation and sample code

The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):

  • Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
    (The exact thresholds will depend on your specific input, machine, and platform.)

    • The reason it is fast is that it avoids Bash loops.
  • printf '%s\n' "${a_in[@]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):

    • "${a_in[@]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).

    • printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.

  • Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).

  • Reading sort's output into an array variable:

    • Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).

    • Bash v3: readarray doesn't exist, so read must be used:
      IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
      (-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)

Annotated sample code:

#!/usr/bin/env bash

# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )

# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[@]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[@]}" | sort)

# Print sorted output array, line by line:
printf '%s\n' "${a_out[@]}"

Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):

*
10
5
a c
b
f

If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):

*
a c
b
f
5
10

[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):
readarray -d '' -t a_out < <(printf '%s\0' "${a_in[@]}" | sort -z).
Michał Górny's helpful answer has a Bash v3 solution.

[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.
By contrast, what follows IFS=$'\n'  in antak's answer is an assignment rather than a command, in which case the IFS change is global.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
8

Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).

local e new_array=()
while IFS= read -r -d '' e; do
    new_array+=( "${e}" )
done < <(printf "%s\0" "${array[@]}" | LC_ALL=C sort -z)

First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:

for e in "${array[@]}"; do
    printf "%s\0" "${e}"
done

The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.

The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).

Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.

As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).

Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).

Michał Górny
  • 18,713
  • 5
  • 53
  • 76
  • Great solution and very helpful explanation, thanks. One extension: Without setting IFS to empty, leading whitespace will also be eliminated - even if otherwise no word splitting was done. – Dirk Herrmann Jan 09 '16 at 00:06
  • Instead of introducing local variable `e` and setting empty IFS, use the REPLY variable. – Robin A. Meade Jun 06 '18 at 22:58
8

In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):

declare BSORT=()
function bubble_sort()
{   #
    # @param [ARGUMENTS]...
    #
    # Sort all positional arguments and store them in global array BSORT.
    # Without arguments sort this array. Return the number of iterations made.
    #
    # Bubble sorting lets the heaviest element sink to the bottom.
    #
    (($# > 0)) && BSORT=("$@")
    local j=0 ubound=$((${#BSORT[*]} - 1))
    while ((ubound > 0))
    do
        local i=0
        while ((i < ubound))
        do
            if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
            then
                local t="${BSORT[$i]}"
                BSORT[$i]="${BSORT[$((i + 1))]}"
                BSORT[$((i + 1))]="$t"
            fi
            ((++i))
        done
        ((++j))
        ((--ubound))
    done
    echo $j
}

bubble_sort a c b 'z y' 3 5
echo ${BSORT[@]}

This prints:

3 5 a b c z y

The same output is created from

BSORT=(a c b 'z y' 3 5) 
bubble_sort
echo ${BSORT[@]}

Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.

Andreas Spindler
  • 7,568
  • 4
  • 43
  • 34
  • 5
    Bubble sort? Wow.. Obama says "bubble sort would be the wrong way to go" -> http://www.youtube.com/watch?v=k4RRi_ntQc8 – Robottinosino Apr 06 '13 at 05:20
  • 1
    Well, it seems while the O-guy wanted to be smart he hadn't sensed that this is not a 50/50 chance question. A predecessor in the position of O-guy, let's tell him the B-guy, once did much better (Reynoldsburg, Ohio, Oct 2000): "I think if you know what you believe, it makes it a lot easier to answer questions. I can't answer your question." So this B-guy really knows something about Boolean logic. The O-guy doesn't. – Andreas Spindler Apr 06 '13 at 14:19
  • The function could be made more easily portable by making BSORT a local array with a nameref to whatever array is to be sorted. ie `local -n BSORT="$1"` at the start of the function. Then you can run `bubble_sort myarray` to sort _myarray_. – johnraff Oct 30 '17 at 02:13
4

Keep it simple ;)

In the following example, the array b is the sorted version of the array a!

The second line echos each item of the array a, then pipes them to the sort command, and the output is used to initiate the array b.

a=(2 3 1)
b=( $( for x in ${a[@]}; do echo $x; done | sort ) )
echo ${b[@]} # output: 1 2 3
Yas
  • 4,957
  • 2
  • 41
  • 24
  • 1
    This answer was flagged as low-quality because of its length and content. Consider adding a short description – Zach Jensz Jun 06 '22 at 01:39
  • 1
    @ZachJensz's point is especially important when there are 18 additional answers, with an accepted answer that has been validated by more than 250 other contributors. Why is this answer preferred over the existing answers? What is it doing differently? Under what circumstances might it be a better choice? – Jeremy Caney Jun 06 '22 at 05:01
  • 1
    @JeremyCaney I read all of the answers on this page, and I consider this answer to be the clearest and the best answer. It's too bad the author didn't post it earlier. I'll leave it to the author to add an explanation. – karel Jun 06 '22 at 07:06
  • This does not handle whitespace in array elements properly. They retain their correct "sorted" order but get split into separate array elements. So for example if `a=("4" "3 2" "1")` then `b=("1" "3" "2" "4")`. It is only correct if you intend to consume the result as `${b[*]}` but it is wrong if you consume it as `"${b[@]}"` – Alcamtar Jul 23 '22 at 00:54
  • 1
    Wrapping IFS around it fixes it though: `IFS=$'\n' ; b=( $( for x in ${a[@]}; do echo $x; done | sort ) ) ; unset IFS` That's a tad unwieldy, though. – Alcamtar Jul 23 '22 at 01:02
  • This is shorter, but along the same lines: `a=(3 "2 a" 1); IFS=$'\n' b=( $(printf "%s\n" "${a[@]}" | sort) ); printf "'%s' " "${b[@]}";` outputs `'1' '2 a' '3'`. Note that the setting of the IFS is restricted to the 2nd statement. Thx @Alcamtar, I was trying to figure out how to set the `IFS` to the LF character. – Adrian Nov 26 '22 at 16:19
3

min sort:

#!/bin/bash
array=(.....)
index_of_element1=0

while (( ${index_of_element1} < ${#array[@]} )); do

    element_1="${array[${index_of_element1}]}"

    index_of_element2=$((index_of_element1 + 1))
    index_of_min=${index_of_element1}

    min_element="${element_1}"

        for element_2 in "${array[@]:$((index_of_element1 + 1))}"; do
            min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"      
            if [[ "${min_element}" == "${element_2}" ]]; then
                index_of_min=${index_of_element2}
            fi
            let index_of_element2++
        done

        array[${index_of_element1}]="${min_element}"
        array[${index_of_min}]="${element_1}"

    let index_of_element1++
done
MathQues
  • 129
  • 1
  • 10
2

If you can compute a unique integer for each element in the array, like this:

tab='0123456789abcdefghijklmnopqrstuvwxyz'

# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
    declare -g ord_${tab:i:1}=$i
done

function sexy_int() {
    local sum=0
    local i ch ref
    for ((i = 0; i < ${#1}; i++)); do
        ch="${1:i:1}"
        ref="ord_$ch"
        (( sum += ${!ref} ))
    done
    return $sum
}

sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"

then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:

array=(a c b f 3 5)
for el in "${array[@]}"; do
    sexy_int "$el"
    sorted[$?]="$el"
done

echo "${sorted[@]}"
  • Pros. Fast.
  • Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.
Lenik
  • 13,946
  • 17
  • 75
  • 103
  • Interesting technique, I've used a variant for finding max/min values without explicit compare/sort. *But*, un-weighted addition without regard to length won't work: "z" sorts before "aaaa", so you can't use this for words as you show above. – mr.spuratic Mar 02 '20 at 13:17
2

try this:

echo ${array[@]} | awk 'BEGIN{RS=" ";} {print $1}' | sort

Output will be:

3
5
a
b
c
f

Problem solved.

DaveShaw
  • 52,123
  • 16
  • 112
  • 141
rsingh
  • 37
  • 1
1
array=(a c b f 3 5)
new_array=($(echo "${array[@]}" | sed 's/ /\n/g' | sort))    
echo ${new_array[@]}

echo contents of new_array will be:

3 5 a b c f
Kara
  • 6,115
  • 16
  • 50
  • 57
blp
  • 37
  • 2
1

There is a workaround for the usual problem of spaces and newlines:

Use a character that is not in the original array (like $'\1' or $'\4' or similar).

This function gets the job done:

# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
             if [[ $* =~ [$wa] ]]; then
                 echo "$0: error: array contains the workaround char" >&2
                 exit 1
             fi

             set -f; local IFS=$'\n' x nl=$'\n'
             set -- $(printf '%s\n' "${@//$nl/$wa}" | sort -n)
             for    x
             do     sorted+=("${x//$wa/$nl}")
             done
       }

This will sort the array:

$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[@]}"
$ printf '<%s>\n' "${sorted[@]}"
<a>
<b>
<c d>
<e
f>
<gh>

This will complain that the source array contains the workaround character:

$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[@]}"
./script: error: array contains the workaround char

description

  • We set two local variables wa (workaround char) and a null IFS
  • Then (with ifs null) we test that the whole array $*.
  • Does not contain any woraround char [[ $* =~ [$wa] ]].
  • If it does, raise a message and signal an error: exit 1
  • Avoid filename expansions: set -f
  • Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
  • We print all values of the arguments received (the input array $@).
  • but we replace any new line by the workaround char "${@//$nl/$wa}".
  • send those values to be sorted sort -n.
  • and place back all the sorted values in the positional arguments set --.
  • Then we assign each argument one by one (to preserve newlines).
  • in a loop for x
  • to a new array: sorted+=(…)
  • inside quotes to preserve any existing newline.
  • restoring the workaround to a newline "${x//$wa/$nl}".
  • done
1

This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):

mergesort() {
  local -n -r input_reference="$1"
  local -n output_reference="$2"
  local -r -i size="${#input_reference[@]}"
  local merge previous
  local -a -i runs indices
  local -i index previous_idx merged_idx \
           run_a_idx run_a_stop \
           run_b_idx run_b_stop

  output_reference=("${input_reference[@]}")
  if ((size == 0)); then return; fi

  previous="${output_reference[0]}"
  runs=(0)
  for ((index = 0;;)) do
    for ((++index;; ++index)); do
      if ((index >= size)); then break 2; fi
      if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
      previous="${output_reference[index]}"
    done
    previous="${output_reference[index]}"
    runs+=(index)
  done
  runs+=(size)

  while (("${#runs[@]}" > 2)); do
    indices=("${!runs[@]}")
    merge=("${output_reference[@]}")
    for ((index = 0; index < "${#indices[@]}" - 2; index += 2)); do
      merged_idx=runs[indices[index]]
      run_a_idx=merged_idx
      previous_idx=indices[$((index + 1))]
      run_a_stop=runs[previous_idx]
      run_b_idx=runs[previous_idx]
      run_b_stop=runs[indices[$((index + 2))]]
      unset runs[previous_idx]
      while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
        if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
          output_reference[merged_idx++]="${merge[run_a_idx++]}"
        else
          output_reference[merged_idx++]="${merge[run_b_idx++]}"
        fi
      done
      while ((run_a_idx < run_a_stop)); do
        output_reference[merged_idx++]="${merge[run_a_idx++]}"
      done
      while ((run_b_idx < run_b_stop)); do
        output_reference[merged_idx++]="${merge[run_b_idx++]}"
      done
    done
  done
}

declare -ar input=({z..a}{z..a})
declare -a output

mergesort input output

echo "${input[@]}"
echo "${output[@]}"
Andrej Podzimek
  • 2,409
  • 9
  • 12
1

Many thanks to the people that answered before me. Using their excellent input, bash documentation and ideas from other treads, this is what works perfectly for me without IFS change

array=("a \n c" b f "3 5")

Using process substitution and read array in bash > v4.4 WITH EOL character

readarray -t sorted < <(sort < <(printf '%s\n' "${array[@]}"))

Using process substitution and read array in bash > v4.4 WITH NULL character

readarray -td '' sorted < <(sort -z < <(printf '%s\0' "${array[@]}"))

Finally we verify with

printf "[%s]\n" "${sorted[@]}"

output is

[3 5]
[a \n c]
[b]
[f]

Please, let me know if that is a correct test for embedded /n as both solutions produce the same result, but the first one is not supposed to work properly with embedded /n

Boris Hamanov
  • 3,085
  • 9
  • 35
  • 58
0
a=(e b 'c d')
shuf -e "${a[@]}" | sort >/tmp/f
mapfile -t g </tmp/f
ChrisF
  • 134,786
  • 31
  • 255
  • 325
Zombo
  • 1
  • 62
  • 391
  • 407
0

I am not convinced that you'll need an external sorting program in Bash.

Here is my implementation for the simple bubble-sort algorithm.

function bubble_sort()
{   #
    # Sorts all positional arguments and echoes them back.
    #
    # Bubble sorting lets the heaviest (longest) element sink to the bottom.
    #
    local array=($@) max=$(($# - 1))
    while ((max > 0))
    do
        local i=0
        while ((i < max))
        do
            if [ ${array[$i]} \> ${array[$((i + 1))]} ]
            then
                local t=${array[$i]}
                array[$i]=${array[$((i + 1))]}
                array[$((i + 1))]=$t
            fi
            ((i += 1))
        done
        ((max -= 1))
    done
    echo ${array[@]}
}

array=(a c b f 3 5)
echo " input: ${array[@]}"
echo "output: $(bubble_sort ${array[@]})"

This shall print:

 input: a c b f 3 5
output: 3 5 a b c f
Andreas Spindler
  • 7,568
  • 4
  • 43
  • 34
  • Bubble sort is ***`O(n^2)`***. I seem to recall most sorting algorithms use an ***`O(n lg(n))`*** until the final dozen elements or so. For the final elements, selection sort is used. – jww May 31 '16 at 00:41
0

Great answers here. Learned a lot. After reading them all, I figure I'd throw my hat into the ring. I think this is the shortest method (and probably faster as it doesn't do much shell script parsing, though there is the matter of the spawning of printf and sort, but they're only called once each) and handles whitespace in the data:

a=(3 "2 a" 1)                                                # Setup!
IFS=$'\n' b=( $(printf "%s\n" "${a[@]}" | sort) ); unset IFS # Sort!
printf "'%s' " "${b[@]}";                                    # Success!

Outputs:

'1' '2 a' '3'

Note that the IFS change is limited in scope to the line it is on. if you know that the array has no whitespace in it, you don't need the IFS modification.

Inspiration was from @yas's answer and @Alcamtar comments.

EDIT

Oh, I somehow missed the actually accepted answer which is even shorter than mine. Doh!

IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS

Turns out that the unset is required because this is a variable assignment that has no command.

I'd recommend going to that answer because it has some interesting stuff on globbing which could be relevant if the array has wildcards in it. It also has a detailed description as to what is happening.

EDIT 2

GNU has an extension in which sort delimits records using \0 which is good if you have LFs in your data. However, when it gets returned to the shell to be assign to an array, I don't see a good way convert it so that the shell will delimit on \0, because even setting IFS=$'\0', the shell doesn't like it and doesn't properly break it up.

Adrian
  • 10,246
  • 4
  • 44
  • 110
-1
array=(z 'b c'); { set "${array[@]}"; printf '%s\n' "$@"; } \
    | sort \
    | mapfile -t array; declare -p array
declare -a array=([0]="b c" [1]="z")
  • Open an inline function {...} to get a fresh set of positional arguments (e.g. $1, $2, etc).
  • Copy the array to the positional arguments. (e.g. set "${array[@]}" will copy the nth array argument to the nth positional argument. Note the quotes preserve whitespace that may be contained in an array element).
  • Print each positional argument (e.g. printf '%s\n' "$@" will print each positional argument on its own line. Again, note the quotes preserve whitespace that may be contained in each positional argument).
  • Then sort does its thing.
  • Read the stream into an array with mapfile (e.g. mapfile -t array reads each line into the variable array and the -t ignores the \n in each line).
  • Dump the array to show its been sorted.

As a function:

set +m
shopt -s lastpipe

sort_array() { 
    declare -n ref=$1
    set "${ref[@]}"
    printf '%s\n' "$@"
    | sort \
    | mapfile -t $ref
}

then

array=(z y x); sort_array array; declare -p array
declare -a array=([0]="x" [1]="y" [2]="z")

I look forward to being ripped apart by all the UNIX gurus! :)

Christopher King
  • 1,034
  • 1
  • 8
  • 21
  • 1
    Have you tested the code? It does not work. `$ref` is local to the subshell that runs `mapfile`, it can't affect parent shell. Could be, that you tested the code under zsh or fish - the question is about bash. – KamilCuk Sep 23 '21 at 13:04
-2

sorted=($(echo ${array[@]} | tr " " "\n" | sort))

In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:

Echo array content --> replace space by newline --> sort

$() is to echo the result

($()) is to put the "echoed result" in an array

Note: as @sorontar mentioned in a comment to a different question:

The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Community
  • 1
  • 1
michael
  • 371
  • 3
  • 12
  • _In the spirit of bash / linux_: I guess you didn't understand the spirit at all. Your code is completely broken (pathname expansion and word splitting). This would be better (Bash≥4): `mapfile -t sorted < <(printf '%s\n' "${array[@]}" | sort)`, otherwise `sorted=(); while IFS= read -r line; do sorted+=( "$line" ); done < <(printf '%s\n' | sort)`. – gniourf_gniourf Mar 07 '16 at 19:29
  • 1
    The antipatterns you're using are: `echo ${array[@]} | tr " " "\n"`: this will break if the fields of array contain whitespaces and glob characters. Besides, it spawns a subshell and uses a useless external command. And due to `echo` being dumb, it will break if your array starts with `-e`, `-E` or `-n`. Instead use: `printf '%s\n' "${array[@]}"`. The other antipattern is : _`($())` is to put the "echoed result" in an array_. Certainly not! this is a horrible antipattern that breaks because of pathname expansion (globbing) and word splitting. Never use this horror. – gniourf_gniourf Mar 07 '16 at 19:32
  • The top answer has the "horrible antipattern". And way to go to downvote someone else's answer to the question you answered yourself. – michael Nov 04 '16 at 14:43
  • Great discussion of the pitfalls! – Christopher King Jan 04 '21 at 05:54