How can I sort filenames within multiple directories into one sequential and numerically ascending array/list?

Question

Let's say I have three directories that each have different amounts of files within them (though in this simplified case, it's four):

BA-2016-05:

AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg

BA-2016-V01:

AG-1712A.jpg
AG-1712B.jpg
AG-1922A.jpg
AG-1922B.jpg

BA-2017-PD02:

AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg

I want the resulting array to look something like this:

AG-1100A.jpg AG-1100B.jpg AG-1100C.jpg AG-1100D.jpg
AG-1712A.jpg AG-1712B.jpg
AG-1829A.jpg AG-1829B.jpg AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg AG-1922B.jpg

The array will be saved to a .txt document and can be space or tab delimited.

I've so far slightly adapted a response from elsewhere online to list all the sorted files by filename in ascending order:

find ~/BA* -iname "*.jpg" |\
awk -vFS=/ -vOFS=/ '{ print $NF,$0 }' |\
sort -n -t / |\
cut -f2- -d/

It should be easy enough to cut off the beginning of the path using filename="${fullpath##*/}", but after that is where I'm stuck. How do I turn this list into an array that's formatted as mentioned above?

A few notes:

The format of the filenames will always be AG-[numbers][A-D] or, to make it more generic, [letters][hyphen][numbers][A-D].
The extensions will always be .jpg or .JPG, but bonus points for one that works with all extensions and preserves them in the output array.

EDIT: I include the final solution I'm using below. It includes a mix of things from both answers I got, plus some gimmicky awk stuff before the output is made to change spaces for tabs. Works like a charm. I also realized I actually needed to include a URL that would be completed by incorporating the filename/path into it. But I was able to figure that out pretty quickly. Anyway, thanks to all for your help and here's the final code:

#!/bin/bash

# The number of the current line
current_nb=;

# Variable to store the current line before writing it
line=;

# Loop through all regular files of the directories and subdirectories specified
# Sort all file paths in ascending order (irrespective of the directory name)
for file in $(find ./BA* -iname "*.jpg" -printf '%f/%p\n' | sort -n -t / | cut -f2- -d/); 
do 

    # Append image URL to each file in the loop
    file_url=`sed 's/^.*\/\(.*\/.*\)/[INSERT URL HERE]/\1/' <<< "$file"`;

    # Extract the number from the current file in the loop
    nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<< "$file"`; 

    # For the first loop, when $current_nb is not initialized, we set $nb as the default value
    current_nb=${current_nb:-$nb}; 

    # If we stay on the same line...
    if [ "$nb" -eq "$current_nb" ]; 
        then 
        # ...then concatenate the new filename with the line currently being created
        line="$line $file_url"; 

        else 
        # Otherwise, append the line at the end of the output file (changing spaces to tabs)...
        echo $line | awk -v OFS="\t" '$1=$1' >> url_list.txt; 

        # ...and prepare a new line
        line="$file_url ";
        current_nb=$nb; 
    fi; 

done;

John1024 · Answer 1 · 2017-08-07T22:16:23.503

The sorted list

To generate the list that you want:

$ find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n
AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg
AG-1712A.jpg
AG-1712B.jpg
AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg
AG-1922B.jpg

Find's printf feature allows customized output. Since you only want file names with directories, we use the %f format option to -printf.

Create array (naive version)

If the file names are guaranteed not to contain whitespace or any other shell-active characters, then the following works:

arr=($(find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n))

We can verify that array arr contains the what you want via:

$ declare -p arr
declare -a arr=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")

Create array (robust version)

To handle the most general file names:

array=()                                                                            
while IFS= read -r -d $'\0'; do                                                     
   array+=("$REPLY")                                                               
done < <(find ./BA* -iname "*.jpg" -printf '%f\0' | sort -zn)

To verify the result:

$ declare -p array
declare -a array=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")

The robust version separates the file names with NUL characters. A full explanation of how this works can be found here.

I really like the simplicity of the array creation as well as the capability of `-printf` (I'm rather new to this)! I included that sorting style within a pipe in the final code because I ended up needing to sort it via filename while still including the directory. Thanks for the help! I've posted the final version of my script up in the body of the original post if you want to check it out or suggest corrections/changes. — rarivero, Aug 09 '17 at 19:03

score 0 · Accepted Answer · answered Aug 07 '17 at 22:20

This is more generic and works for all extensions. In addition, I do not create any array, but write the result directly into the output file.

#!/bin/bash
# The number of the current line
current_nb=;
# Variable to store the current line before writing it
line=;
# Loop through all regular files of this directory and its subdirs sorted
# We extract the basename (e.g. AG-1829A.jpg )
for file in $(find . -type f -exec basename {} \; | sort -n); do 
    # Extract its number
    nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<<"$file"`; 
    # For the first loop, when current_nb is not initialized, we set $nb as default value
    current_nb=${current_nb:-$nb}; 
    # If we stay on the same line
    if [ "$nb" -eq "$current_nb" ]; then 
        # Concatenate the new filename
        line="$line $file"; 
    else 
        # Else append the line at the end of file
        echo $line >> out.txt; 
        # And prepare the new one
        line="$file ";
        current_nb=$nb; 
    fi; 
done;

This works great! It produces exactly the file structure I need. I made a few tweaks to it (changed the `find` section, added a few pipes, etc) and posted the final code in my original post. Great solution! Thanks! — rarivero, Aug 09 '17 at 19:00

How can I sort filenames within multiple directories into one sequential and numerically ascending array/list?

2 Answers2

The sorted list

Create array (naive version)

Create array (robust version)