I can't seem to find any way of sorting a word based on its characters in awk. For example if the word is "hello" then its sorted equivalent is "ehllo". how to achieve this in awk ?
-
5This sounds like an XY problem See: [What is the XY problem?](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Awk is not strong on sorting and even less so within a string. Why have you chosen awk? If that isn't a hard requirement - you are better served using another tool. – David C. Rankin Sep 14 '21 at 05:21
-
I have started learning shell scripting a few days ago, can you recommend the tool to sort a string ? I have to check if a particular string is a permutation of another string. I thought of sorting both and then comparing letter by letter, but could not find anything to sort the string. – Abhi Sep 14 '21 at 05:23
-
2For non-awk based solutions: https://stackoverflow.com/questions/2373874/how-to-sort-characters-in-a-string – Sundeep Sep 14 '21 at 06:23
-
This site exists to help people with their code. [edit] your question to include a [mcve] with concise, testable sample input and expected output plus your attempt to solve the problem yourself and then we can help you. – Ed Morton Sep 14 '21 at 16:30
5 Answers
With GNU awk for PROCINFO[]
, "sorted_in" (see https://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning) and splitting with a null separator resulting in an array of chars:
$ echo 'hello' |
awk '
BEGIN { PROCINFO["sorted_in"]="@val_str_asc" }
{
split($1,chars,"")
word = ""
for (i in chars) {
word = word chars[i]
}
print word
}
'
ehllo
$ echo 'hello' | awk -v ordr='@val_str_asc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ehllo
$ echo 'hello' | awk -v ordr='@val_str_desc' 'BEGIN{PROCINFO["sorted_in"]=ordr} {split($1,chars,""); word=""; for (i in chars) word=word chars[i]; print word}'
ollhe

- 188,023
- 17
- 78
- 185
Another option is a Decorate-Sort-Undecorate with sed
. Essentially, you use sed
to break "hello"
into one character per-line (decorating each character with a newline '\n'
) and pipe the result to sort
. You then use sed
to do the reverse (undecorate each line by removing the '\n'
) to join the lines back together.
printf "hello" | sed 's/\(.\)/\1\n/g' | sort | sed '{:a N;s/\n//;ta}'
ehllo
There are several approaches you can use, but this one is shell friendly, but the behavior requires GNU sed
.

- 81,885
- 6
- 58
- 85
This would be more doable with gawk, which includes the asort
function to sort an array:
awk 'BEGIN{FS=OFS=ORS=""}{split($0,a);asort(a);for(i in a)print a[i]}'<<<hello
This outputs:
ehllo

- 91,368
- 6
- 71
- 106
-
This is wrong, idk why it's the accepted answer. I guess maybe it produced the expected output for some sample input and the OP thought that it'd always do that but it won't - if that happened it was just luck. – Ed Morton Sep 14 '21 at 16:28
-
@EdMorton Hi, thanks for your insight. Could you please tell why it would fail ? – Abhi Sep 15 '21 at 04:12
-
@Abhi `for(i in a)` by default visits the array indices in implementation-dependent (often hash) order (see https://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array) so you can have input like `bca` then run `asort()` and it becomes `abc` then run `for (i in a)` and get output of `cab` or `bca` or `acb` or anything else. Even if you get the output you want now, you can get output you don't want for other input or for a different version of the tool or when run on a different platform or..... – Ed Morton Sep 15 '21 at 12:03
-
@EdMorton I get your point now. Thanks for the clarification. I read the link that you had provided and for `gawk`, I found that it can forced to iterate in numeric order by setting `PROCINFO["sorted_in"]`. Though I don't exactly know how to do it here. – Abhi Sep 15 '21 at 14:19
-
@Abhi I already provided an answer that does exactly that, see https://stackoverflow.com/a/69192721/1745001. – Ed Morton Sep 15 '21 at 14:20
You need to write a function to sort letters in a word (see : https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html):
function siw(word, result, arr, arrlen, arridx) {
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx++) {
result = result arr[arridx]
}
return result
}
And define a sort sub-function to compare two words (see : https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html):
function compare_by_letters(i1, v1, i2, v2, left, right) {
left = siw(v1)
right = siw(v2)
if (left < right)
return -1
else if (left == right)
return 0
else
return 1
}
And use this function with awk sort function:
asort(array_test, array_test_result, "compare_by_letters")
Then, the sample program is:
function siw(word, result, arr, arrlen, arridx) {
result = hash_word[word]
if (result != "") {
return result
}
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx++) {
result = result arr[arridx]
}
hash_word[word] = result
return result
}
function compare_by_letters(i1, v1, i2, v2, left, right) {
left = siw(v1)
right = siw(v2)
if (left < right)
return -1
else if (left == right)
return 0
else
return 1
}
{
array_test[i++] = $0
}
END {
alen = asort(array_test, array_test_result, "compare_by_letters")
for (aind = 1; aind <= alen; aind++) {
print array_test_result[aind]
}
}
Executed like this:
echo -e "fail\nhello\nborn" | awk -f sort_letter.awk
Output:
fail
born
hello
Of course, if you have a big input, you could adapt siw
function to memorize result for fastest compute:
function siw(word, result, arr, arrlen, arridx) {
result = hash_word[word]
if (result != "") {
return result
}
split(word, arr, "")
arrlen = asort(arr)
for (arridx = 1; arridx <= arrlen; arridx++) {
result = result arr[arridx]
}
hash_word[word] = result
return result
}

- 2,039
- 9
- 14
here's a very unorthodox method for a quick-n-dirty approach, if you really want to sort "hello" into "ehllo" :
mawk/mawk2/gawk 'BEGIN { FS="^$"
# to make it AaBbCc… etc; chr(65) = ascii "A"
for (x = 65; x < 91; x++) {
ref = sprintf("%s%c%c",ref, x, x+32)
} } /^[[:alpha:]]$/ { print } /[[:alpha:]][[:alpha:]]+/ {
# for gawk/nawk, feel free to change
# that to /[[:alpha:]]{2,}/
# the >= 2+ condition is to prevent wasting time
# sorting single letter words "A" and "I"
s=""; x=1; len=length(inp=$0);
while ( len && (x<53) ) {
if (inp~(ch = substr(ref,x++,1))) {
while ( sub(ch,"",inp) ) {
s = s ch;
len -= 1 ;
} } }
print s }'
I'm aware it's an extremely inefficient way of doing selection sort. The potential time-savings stem from instant loop ending the moment all letters are completed, instead of iterating all 52 letters everytime. The downside is that it doesn't pre-profile the input
(e.g. if u detect that this row is only lower-case, then u can speed it up with a lowercase only loop instead)
The upside is that it eliminates the need for custom-functions, eliminate any gawk dependencies, and also eliminate the need to split every row into an array (or every character into its own field)
i mean yes technically one can set FS to null string thus automatically becomes having NF as the string length. But at times it could be slow if input is a bit large. If you need unicode support, then a match()
-based approach is more desirable.
- added (x<53) condition to prevent run-away infinite loops in case input isn't pure ASCII letters

- 2,453
- 3
- 11