0

I'd like to write a bash script that execute the files in a directory in sorted name order. Specifically, the name order is by en_US.utf8.

For example, if the dir /mydir has the following files:

a1.txt
a2.txt
b1.txt
c1.txt

I have a function like this:

show_content() {
  for f in "$@"; do
  echo $f
  cat $f
  done
}

and run it by

show_content /mydir/*

How can I improve the code to make sure that the show_content function execute these files in /mydir the name order? How to specify if I want it to follow en_US.utf8?

Is there a way to ensure that the en_US.utf8 is just effective for this bash script? Per @CharlesDuffy and @oguzismail's answer that we can change the setting for LC_COLLATE to en_US.utf8, but could it be done more "locally" without touching the global settings?

IsaIkari
  • 1,002
  • 16
  • 31
  • FYI, you want to `echo "$f"` and `cat "$f"` -- if you leave out the quotes, filenames with spaces will misbehave for the `cat`, and you'll get similar bugs with the `echo`, just (usually, but not always) more-subtle ones. See also [I just assigned a variable; why does `echo $variable` show something else?](https://stackoverflow.com/questions/29378566). And to be _really_ reliable you want `printf '%s\n' "$f"` -- see [Why is printf better than echo?](https://unix.stackexchange.com/questions/65803) on [unix.se]. – Charles Duffy Oct 08 '21 at 16:48
  • Anyhow -- what's the value of `LC_COLLATE`? And what's the output of the `locale` command as a whole? – Charles Duffy Oct 08 '21 at 16:49
  • If `en_US.utf8` is your current locale, you don't need to do anything. Otherwise set `LC_ALL` to `en_US.utf8`. – oguz ismail Oct 08 '21 at 16:51
  • Note, btw, that when you run `show_content /mydir/*`, the `/mydir/*` is replaced with a list of filenames **before** `show_content` starts running, so it's too late to change how the glob is expanded -- you can potentially do an additional sorting pass after the expansion, but you can't change how the expansion will occur, because it already finished. – Charles Duffy Oct 08 '21 at 16:51
  • @IsaIkari you only need `LC_COLLATE=en_US.utf8` – Léa Gris Oct 08 '21 at 17:15
  • @CharlesDuffy Thanks! I saw a `LC_COLLATE="C"` in my OS. But is there a way to make the en_US.utf8 sorting only effective in this bash script, so I don't need to change any global setting? – IsaIkari Oct 08 '21 at 17:15
  • 2
    Environment variables are not global. If you set it in the script, it will only apply to the script (and any programs it invokes). You can confirm this with `echo "$LC_COLLATE"` after the script has ended. It will show the original value. – that other guy Oct 08 '21 at 17:26
  • 1
    @IsaIkari, is the `/mydir/*` part of the script, or is it part of what the person starting the script does? If it's the caller who expands the glob, then either that caller needs to have `LC_COLLATE` set, or you need to make your script redo the sorting after it's started. But if the glob is inside the script you can just `export LC_COLLATE=en_US.utf8` inside your script and you're good. – Charles Duffy Oct 08 '21 at 18:43
  • @CharlesDuffy It's part of the script. Thanks for the explanation! – IsaIkari Oct 08 '21 at 18:53
  • Great! Since it sounds like that comment is what you needed, I'll add an answer w/ the same info. – Charles Duffy Oct 08 '21 at 18:56

1 Answers1

0

The environment variable LC_COLLATE determines which locale's collation rules should be used for sorting.

To make glob expressions within your script use en_US.utf8, you need only add:

export LC_COLLATE=en_US.utf8

before the glob expression is reached in your script. This change is local: It only changes the script itself and subprocesses it runs.


By contrast, if you needed to re-sort arguments that were passed in from the outside, doing that reliably (using a sort with GNU extensions and a sufficiently modern bash release) might look like:

readarray -0 -t args < <(
  printf '%s\0' "$@" | LC_COLLATE=en_US.utf8 sort -z
)
set -- "${args[@]}"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441