Illogical number priority in file names in BASH

Question

It so happens that I wrote a script in BASH, part of which is supposed to take files from a specified directory in numerical order. Obviously, files in that directory are named as follows: 1, 2, 3, 4, 5, etc. The thing is, I discovered that while running this script with 10 files in the directory, something that appears quite illogical to me, occurs, as the script takes files in strange order: 10, 1, 2, 3, etc.

How do I make it run from minimum value of name of a file to maximum in decimals?

Also, I am using the following line of code to define loop and path:

for file in /dir/*

Don't know if it matters, but I'm using Fedora 33 as OS.

score 2 · Accepted Answer · answered Jun 02 '21 at 12:18

Directories are sorted by alphabetical order. So "10" is before "2". If I list 20 files whose names correspond to the 20 first integers, I get:

1  10  11  12  13  14  15  16  17  18  19  2  20  3  4  5  6  7  8  9

I can call the function 'sort -n' so I'll sort them numerically rather than alphabetically. The following command:

for i in $(ls | sort -n) ; do echo $i ; done

produces the following output:

i.e. your command:

for file in /dir/*

should be rewritten: for file in "dir/"$(ls /dir/* | sort -n)

`array=($(ls dir/ | sort -h)) for file in "${array[@]}";` That's what I ended up sticking to, despite it being less elegant. Either way, thanks for your contribution! — el_bulm, Jun 02 '21 at 13:25

score 1 · Answer 2 · answered Jun 02 '21 at 13:13

1

If you have GNU sort then use the -V flag.

for file in /dir/* ; do echo "$file" ; done | sort -V

Or store the data in an array.

files=(/dir/*); printf '%s\n' "${files[@]}" | sort -V

answered Jun 02 '21 at 13:13

Jetchisel

7,493
2
19
18

score 1 · Answer 3 · answered Jun 02 '21 at 13:37

As an aside, if you have the option and work once ahead of time is preferable to sorting every time, you could also format the names of your directories with leading zeroes. This is frequently a better design when possible.

I made both for some comparisons.

$: echo [0-9][0-9]/ # perfect list based on default string sort
00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/ 16/ 17/ 18/ 19/ 20/

That also filters out any non-numeric names, and any non-directories.

$: for d in [0-9][0-9]/; do echo "${d%/}"; done
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20

If I show both single- and double-digit versions (I made both)

$: shopt -s extglob
$: echo @(?|??)
0 00 01 02 03 04 05 06 07 08 09 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9

Only the single-digit versions without leading zeroes get out of order.

dan · Answer 4 · 2021-06-02T20:09:02.180

The shell sorts the names by the locale order (not necessarily the byte value) of each individual character. Anything that starts with 1 will go before anything that starts with 2, and so on.

There's two main ways to tackle your problem:

sort -n (numeric sort) the file list, and iterate that.
Rename or recreate the target files (if you can), so all numbers are the same length (in bytes/characters). Left pad shorter numbers with 0 (eg. 01). Then they'll expand like you want.

Using sort (properly):

mapfile -td '' myfiles <(printf '%s\0' * | sort -zn)

for file in "${myfiles[@]}"; do
    # what you were going to do

sort -z for zero/null terminated lines is common but not posix. It makes processing paths/data that contains new lines safe. Without -z:

mapfile -t myfiles <(printf '%s\n' * | sort -n)
# Rest is the same.

Rename the target files:

#!/bin/bash

cd /path/to/the/number/files || exit 1

# Gets length of the highest number. Or you can just hardcode it.
length=$(printf '%s\n' * | sort -n | tail -n 1)
length=${#length}

for i in *; do
    mv -n "$i" "$(printf "%.${length}d" "$i")"
done

Examples for making new files with zero padded numbers for names:

touch {000..100} # Or
for i in {000..100}; do
    > "$i"
done

If it's your script that made the target files, something like $(printf %.Nd [file]) can be used to left pad the names before you write to them. But you need to know the length in characters of the highest number first (N).

Illogical number priority in file names in BASH

4 Answers4