bash: sorting based on numerical distance

Question

I have a file containing a list of people with their gender and age like this:

name1    M    73.2
name2    M    31.5
name3    F    20.3
name4    F    55.0
...

Is there a bash one-liner to sort this list based on numerical distances to a given age, say 30.0, so that the result becomes:

name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

The output is simply ordered by age, which is indistinguishable from sorting by difference from any number below all the numbers in that column. In any case, what you need is to add another column to the data with the offset from 20 and sort by that column. — l0b0, Apr 15 '20 at 03:58
Thank you for this suggestion! Can I edit my question to avoid the coincidence and see if other people have a better solution? — zhihao_li, Apr 15 '20 at 04:05
Sure thing. Although both of these problems have been solved before on this site: [1](https://stackoverflow.com/a/44530875/96588), [2](https://stackoverflow.com/a/6438940/96588). — l0b0, Apr 15 '20 at 04:09
Thank you so much! I have edited my question for you to re-open it. Your solution takes two steps of subtraction and sorting, but I am curious if there is even a simpler one. Thank you again! — zhihao_li, Apr 15 '20 at 04:16
Neither of the suggested questions provides an answer to this question, IMO. — Jonathan Leffler, Apr 15 '20 at 05:04
I agree that neither suggestions can work directly, but they are still helpful information. — zhihao_li, Apr 15 '20 at 05:08

score 2 · Accepted Answer · answered Apr 15 '20 at 04:54

In a similar manner, if there is a need to preserve the line format on the original, instead of printing the first three field, you can use a variable and truncate after the third field of the results from sort, e.g.

awk 'function abs(v) { return v < 0 ? -v : v }
    { print $0"\t"abs($NF-30) }' file | 
sort -k4n |
awk '{ out=$0; print substr(out, 0, match (out,$3)+length($3)) }'

Example Use/Output

With your example file in the file named file, you would get:

$ awk 'function abs(v) { return v < 0 ? -v : v }
>     { print $0"\t"abs($NF-30) }' file |
> sort -k4n |
> awk '{ out=$0; print substr(out, 0, match (out,$3)+length($3)) }'
name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

(note: you can just select-copy the original awk expression and then in an xterm with file in the current working directory, middle-mouse-paste to test)

Jonathan Leffler · Answer 2 · 2020-04-15T06:05:29.690

Any version of Awk

awk -v ref=30.0 '{ print $1, $2, $3, ($3 < ref) ? ref - $3 : $3 - ref }' |
sort -k4,4n |
awk '{ print $1, $2, $3 }'

Add the distance from the reference age as an extra column, sort on it, remove it. You could use cut for the removal operation if you prefer. If you use GNU Awk, you can do it all in awk. There are ways to preserve the spacing if that's important to you.

You can write it all on one line if you insist; that's your choice.

All-in-one using GNU Awk

Checking the GNU Awk manual shows that there isn't an abs() built-in function, which is a little surprising. GNU Awk does have the asort() and asorti() functions which can be used to sort the data internally, thereby allowing the code to use a single call to awk and no calls to the sort command. This also preserves the spacing in the original data.

This variation uses the 'square of the distance' idea suggested by zhihao_li in their answer.

gawk -v ref=48.0 '
function comp_idx(i1, v1, i2, v2) {
    if (i1+0 < i2+0) return -1; else if (i1+0 > i2+0) return +1; else return 0;
}
    { data[($3-ref)^2] = $0 }
END { 
      n = asorti(data, results, "comp_idx")
      for (i = 1; i <= n; i++) print data[results[i]]
    }' "$@"

The +0 operations in the comp_idx function are necessary to force awk to treat the index values as numbers rather than strings. Without those, the sort order was based on the lexicographical (not numeric) order of the squared distances. If a single line is important, you could write that all on one line, but you'd need a sprinkling of semicolons added too. I don't recommend it.

You could revise the code into a more comprehensive shell script that takes the age as an argument that's passed to Awk (the -v ref=30.0 mechanism). That's more fiddly than difficult. As it stands, it just processes the files it is given — or standard input if no files are given.

With the sample data, the output for the reference age of 48.0 is:

name4    F    55.0
name2    M    31.5
name1    M    73.2
name3    F    20.3

Change the reference age from 48.0 to 30.0 as in the question and the result is:

name2    M    31.5
name3    F    20.3
name4    F    55.0
name1    M    73.2

Something about *minds thinking alike*. I was wrapping up when I saw your answer. I do like the simplicity you managed. — David C. Rankin, Apr 15 '20 at 04:55
Thanks. I've just been checking the [GNU Awk manual](https://www.gnu.org/software/gawk/manual/gawk.html) and there doesn't appear to be an `abs` built-in function, which is a little surprising. GNU Awk does have the [`asort()` and `asorti()`](https://www.gnu.org/software/gawk/manual/gawk.html#Array-Sorting-Functions) functions which could probably be used to sort the data internally, thereby allowing the code to use a single call to `awk` and no calls to the `sort` command. — Jonathan Leffler, Apr 15 '20 at 04:58
It doesn't, I actually searched it and found there was none -- than then found the SO question related to it -- must be a lot of smart folks helping with that site... — David C. Rankin, Apr 15 '20 at 05:06

zhihao_li · Answer 3 · 2020-04-15T05:01:55.723

0

The discussion above about adding another column was helpful. I came up with this solution with ${ag} providing the given age. The square operation is simpler than checking on the absolution.

awk -v a=${ag} '{print $1,$2,$3,($3-a)^2}' | sort -n -k 4

edited Apr 15 '20 at 05:01

answered Apr 15 '20 at 04:55

zhihao_li

183
10

Using the square of the difference reduces the code — it's a good idea. – Jonathan Leffler Apr 15 '20 at 05:08
1

That's a good trick. Also need to pipe it to something like `cut -d' ' -f-3` to trim the extra field. – David C. Rankin Apr 15 '20 at 05:12

score 0 · Answer 4 · answered Apr 15 '20 at 06:40

Another approach, using perl instead of awk:

$ age=30 perl -anE 'push @lines, [@F, abs($ENV{age} - $F[2])];
   END { say join("\t", $_->@[0..2]) for sort { $a->[3] <=> $b->[3] } @lines }' input.txt 
name2   M   31.5
name3   F   20.3
name4   F   55.0
name1   M   73.2

bash: sorting based on numerical distance

4 Answers4

Any version of Awk

All-in-one using GNU Awk