16

Parsing output of ls to iterate through list of files is bad. So how should I go about iterating through list of files in order by which they were first created? I browsed several questions here on SO and they all seem to parsing ls.

The embedded link suggests:

Things get more difficult if you wanted some specific sorting that only ls can do, such as ordering by mtime. If you want the oldest or newest file in a directory, don't use ls -t | head -1 -- read Bash FAQ 99 instead. If you truly need a list of all the files in a directory in order by mtime so that you can process them in sequence, switch to perl, and have your perl program do its own directory opening and sorting. Then do the processing in the perl program, or -- worst case scenario -- have the perl program spit out the filenames with NUL delimiters.

Even better, put the modification time in the filename, in YYYYMMDD format, so that glob order is also mtime order. Then you don't need ls or perl or anything. (The vast majority of cases where people want the oldest or newest file in a directory can be solved just by doing this.)

Does that mean there is no native way of doing it in bash? I don't have the liberty to modify the filename to include the time in them. I need to schedule a script in cron that would run every 5 minutes, generate an array containing all the files in a particular directory ordered by their creation time and perform some actions on the filenames and move them to another location.

The following worked but only because I don't have funny filenames. The files are created by a server so it will never have special characters, spaces, newlines etc.

files=( $(ls -1tr) ) 

I can write a perl script that would do what I need but I would appreciate if someone can suggest the right way to do it in bash. Portable option would be great but solution using latest GNU utilities will not be a problem either.

jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 3
    +1 great question, I had this scenario many times and ended up using `ls -l` only :( – anubhava Aug 29 '14 at 22:48
  • I don't have the command written out at the moment but I think `find` with `printf` like BurhanKhalid's answer only ending in `\0` and then piped to a modern (possibly GNU) awk which can control sorting might work. Possibly also GNU awk to use `FIELDWIDTHS` to control exactly how/where the fields are split. – Etan Reisner Aug 29 '14 at 23:11
  • 1
    You know that creation time isn't recorded on all filesystems, yes? – Ignacio Vazquez-Abrams Aug 29 '14 at 23:25
  • So... what exactly is wrong with using perl? – user123444555621 Aug 30 '14 at 00:01
  • @Pumbaa80 Nothing wrong with using `perl`. I am just curious to see if there is a safer way of doing the same in `bash`. `bash` has come along way and was hoping if the maintainers have defined a suggested approach for these kind of use cases. – jaypal singh Aug 30 '14 at 00:03
  • So basically you're asking whether the Wiki article is wrong. Also, keep in mind that `ls`, `find` etc. aren't part of Bash. So, the GNU utils are just as "native Bash" as Perl is. – user123444555621 Aug 30 '14 at 00:10
  • 1
    If you read BASH FAQ - the problems protected against in not using `ls` to parse a directory are rather rare (i.e. a 'newline' within a filename, etc...). Not to encourage it's use, but if you have sane filenames, then using `ls -opts` to populate a loop has no ill side-effects. If you like your `newlines` in your filenames -- then don't use `ls` to populate a loop. – David C. Rankin Aug 30 '14 at 01:35

8 Answers8

6
sorthelper=();
for file in *; do
    # We need something that can easily be sorted.
    # Here, we use "<date><filename>".
    # Note that this works with any special characters in filenames

    sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
    # or
    sorthelper+=("$(stat --printf "%Y    %n" -- "$file")"); # Linux only
done;

sorted=();
while read -d $'\0' elem; do
    # this strips away the first 14 characters (<date>) 
    sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[@]}" | sort -z)

for file in "${sorted[@]}"; do
    # do your stuff...
    echo "$file";
done;

Other than sort and stat, all commands are actual native Bash commands (builtins)*. If you really want, you can implement your own sort using Bash builtins only, but I see no way of getting rid of stat.

The important parts are read -d $'\0', printf '%s\0' and sort -z. All these commands are used with their null-delimiter options, which means that any filename can be procesed safely. Also, the use of double-quotes in "$file" and "${anarray[*]}" is essential.

*Many people feel that the GNU tools are somehow part of Bash, but technically they're not. So, stat and sort are just as non-native as perl.

Community
  • 1
  • 1
user123444555621
  • 148,182
  • 27
  • 114
  • 126
  • There is no `-n` option in my stat. – Burhan Khalid Aug 30 '14 at 07:44
  • @BurhanKhalid Dammit, why does it have to be so complicated? ;) I'll fix that. – user123444555621 Aug 30 '14 at 09:21
  • It should work now. I had only tested on Mac, and obviously its `stat` is very different from the one on Linux. – user123444555621 Aug 30 '14 at 10:30
  • Thank you @Pumbaa80 for taking out time to write this up with explanation. – jaypal singh Aug 30 '14 at 18:15
  • To your point "stat and sort are just as non-native as perl": GNU software like `sort` from `coreutils` and `bash` are both part of the core of any GNU OS by default. Scripting languages like `perl` or `python` are not. – John B Aug 30 '14 at 22:21
  • @JohnB The only *NIX-like OS without Perl that I know of is FreeBSD. Which doesn't have Bash either. On the other side, Bash comes with non-GNU systems like Mac OS. – user123444555621 Aug 31 '14 at 10:21
  • Perl is a language that may be installed by default on many *NIX systems, but that doesn't make it a system builtin utility. Whether GNU or FreeBSD, a *NIX-like system will have `sort` and `stat` as core system utilities along with `bash`. – John B Aug 31 '14 at 14:26
5

With all of the cautions and warnings against using ls to parse a directory notwithstanding, we have all found ourselves in this situation. If you do find yourself needing sorted directory input, then about the cleanest use of ls to feed your loop is ls -opts | read -r name; do... This will handle spaces in filenames, etc.. without requiring a reset of IFS due to the nature of read itself. Example:

ls -1rt | while read -r fname; do  # where '1' is ONE not little 'L'

So do look for cleaner solutions avoiding ls, but if push comes to shove, ls -opts can be used sparingly without the sky falling or dragons plucking your eyes out.

let me add the disclaimer to keep everyone happy. If you like newlines inside your filenames -- then do not use ls to populate a loop. If you do not have newlines inside your filenames, there are no other adverse side-effects.

Contra: TLDP Bash Howto Intro:

    #!/bin/bash
    for i in $( ls ); do
        echo item: $i
    done

It appears that SO users do not know what the use of contra means -- please look it up before downvoting.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • 2
    See my comment to the original questions. If you like `newlines` in your filenames -- then **do not** use `ls` to populate a loop. On the otherhand -- regarding `newlines` within filenames (I have seen -- zero to be exacts in 20 years of normal Linux use) The only way I can image getting a `newline` inside a filename is by some horrible mistake. – David C. Rankin Aug 30 '14 at 01:38
  • 1
    Ok, but that is exactly what the question is about. BTW, have you heard of the [`Icon` file in OS X](http://superuser.com/questions/298785/icon-file-on-os-x-desktop)? – user123444555621 Aug 30 '14 at 15:46
  • Excellent example, and No I had not. I haven't used a Mac since 1988 at A&M (cricket graph if I recall - to prevent having to cut (with scissors) and tape (paste) graphs into an aero-engineering design project) [that dates me]. That is a great example where, if you were interested in the hidden icon files, you could **not** use `ls` to populate a loop. Are there any other examples? Literally, that is the only one I've now run across. I had pondered whether a question would be appropriate asking just which files have control characters in them. I'm curious. I'll search and drop a comment. – David C. Rankin Aug 30 '14 at 22:04
2

You can try using use stat command piped with sort:

stat -c '%Y %n' * | sort -t ' ' -nk1 | cut -d ' ' -f2-

Update: To deal with filename with newlines we can use %N format in stat andInstead of cut we can use awk like this:

LANG=C stat -c '%Y^A%N' *| sort -t '^A' -nk1| awk -F '^A' '{print substr($2,2,length($2)-2)}'
  1. Use of LANG=C is needed to make sure stat uses single quotes only in quoting file names.
  2. ^A is conrtrol-A character typed using ControlVA keys together.
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    I thought of stat too and also got to `%N` but it didn't seem to quote sanely enough. It translated newlines to `\n` which was something but not great. It also used the annoying GNU quoting style of `\`...'` and then failed to quote/escape `\`` in filenames. – Etan Reisner Aug 29 '14 at 23:09
  • @jaypal: please check updated answer, not the best looking one but a work around. – anubhava Aug 29 '14 at 23:22
  • Just a small note that instead of awk we can also use: `while IFS="'" read -r _ f; do echo -e "$f"; done` – anubhava Aug 29 '14 at 23:40
  • 1
    Parsing the output of `stat` is no better than parsing that of `ls` – user123444555621 Aug 29 '14 at 23:53
  • I don't think `ls` has any option like `stat` has with `%N`. – anubhava Aug 30 '14 at 04:15
1

How about a solution with GNU find + sed + sort?

As long as there are no newlines in the file name, this should work:

find . -type f -printf '%T@ %p\n' | sort -k 1nr | sed 's/^[^ ]* //'
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
  • Thank you for the answer and your time. I was _kinda_ hoping for a little cleaner solution that would also handle filename caveats. `:)` – jaypal singh Aug 29 '14 at 22:47
  • 1
    `stat -c '%Y %N'` instead of `find`? Though the quoting leaves something to be desired from a "working with it" perspective (and doesn't appear to cover all odd filename possibilities). – Etan Reisner Aug 29 '14 at 22:55
  • 6
    Parsing the output of `find` is no better than parsing that of `ls` – user123444555621 Aug 29 '14 at 23:52
1

Each file has three timestamps:

  1. Access time: the file was opened and read. Also known as atime.
  2. Modification time: the file was written to. Also known as mtime.
  3. Inode modification time: the file's status was changed, such as the file had a new hard link created, or an existing one removed; or if the file's permissions were chmod-ed, or a few other things. Also known as ctime.

Neither one represents the time the file was created, that information is not saved anywhere. At file creation time, all three timestamps are initialized, and then each one gets updated appropriately, when the file is read, or written to, or when a file's permissions are chmoded, or a hard link created or destroyed.

So, you can't really list the files according to their file creation time, because the file creation time isn't saved anywhere. The closest match would be the inode modification time.

See the descriptions of the -t, -u, -c, and -r options in the ls(1) man page for more information on how to list files in atime, mtime, or ctime order.

whoan
  • 8,143
  • 4
  • 39
  • 48
Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
1

It may be a little more work to ensure it is installed (it may already be, though), but using zsh instead of bash for this script makes a lot of sense. The filename globbing capabilities are much richer, while still using a sh-like language.

files=( *(oc) )

will create an array whose entries are all the file names in the current directory, but sorted by change time. (Use a capital O instead to reverse the sort order). This will include directories, but you can limit the match to regular files (similar to the -type f predicate to find):

files=( *(.oc) )

find is needed far less often in zsh scripts, because most of its uses are covered by the various glob flags and qualifiers available.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Thanks @chepner. I have never really worked on `zsh` before so this is helpful. – jaypal singh Aug 30 '14 at 19:19
  • 1
    It's similar enough to `bash` that you can (almost) treat it as `bash` until you want to use its more advanced features. There are a few defaults that differ from `bash` (1-based array indexing and parameter expansions do not undergo word-splitting), but can be configured to behave like `bash`. – chepner Aug 30 '14 at 19:49
1

I've just found a way to do it with bash and ls (GNU).
Suppose you want to iterate through the filenames sorted by modification time (-t):

while read -r fname; do
    fname=${fname:1:((${#fname}-2))} # remove the leading and trailing "
    fname=${fname//\\\"/\"}          # removed the \ before any embedded "
    fname=$(echo -e "$fname")        # interpret the escaped characters
    file "$fname"                    # replace (YOU) `file` with anything
done < <(ls -At --quoting-style=c)

Explanation

Given some filenames with special characters, this is the ls output:

$ ls -A
 filename with spaces   .hidden_filename  filename?with_a_tab  filename?with_a_newline  filename_"with_double_quotes"

$ ls -At --quoting-style=c
".hidden_filename"  " filename with spaces "  "filename_\"with_double_quotes\""  "filename\nwith_a_newline"  "filename\twith_a_tab"

So you have to process a little each filename to get the actual one. Recalling:

${fname:1:((${#fname}-2))} # remove the leading and trailing "
# ".hidden_filename" -> .hidden_filename
${fname//\\\"/\"}          # removed the \ before any embedded "
# filename_\"with_double_quotes\" -> filename_"with_double_quotes"
$(echo -e "$fname")        # interpret the escaped characters
# filename\twith_a_tab -> filename     with_a_tab

Example

$ ./script.sh
.hidden_filename: empty
 filename with spaces : empty
filename_"with_double_quotes": empty
filename
with_a_newline: empty
filename    with_a_tab: empty

As seen, file (or the command you want) interprets well each filename.

whoan
  • 8,143
  • 4
  • 39
  • 48
  • 1
    Thank you for your answer. I went with the perl script but its good to have a repo of answers for others who can pick the one that suits them the best. – jaypal singh Jan 20 '15 at 03:55
0

Here's a way using stat with an associative array.

n=0
declare -A arr
for file in *; do
    # modified=$(stat -f "%m" "$file") # For use with BSD/OS X
    modified=$(stat -c "%Y" "$file") # For use with GNU/Linux
    # Ensure stat timestamp is unique
    if [[ $modified == *"${!arr[@]}"* ]]; then
        modified=${modified}.$n
        ((n++))
    fi
    arr[$modified]="$file"
done
files=()
for index in $(IFS=$'\n'; echo "${!arr[*]}" | sort -n); do
    files+=("${arr[$index]}")
done

Since sort sorts lines, $(IFS=$'\n'; echo "${!arr[*]}" | sort -n) ensures the indices of the associative array get sorted by setting the field separator in the subshell to a newline.

The quoting at arr[$modified]="${file}" and files+=("${arr[$index]}") ensures that file names with caveats like a newline are preserved.

John B
  • 3,566
  • 1
  • 16
  • 20
  • Think about it: what happens when two files have the same timestamp? The array entry of one of them will be overwritten in the first `for` loop. – user123444555621 Aug 30 '14 at 11:59
  • Does not work on OS X or BSD. BSD (OS X) `stat` lacks the `-c` and different time stamp format. – dawg Aug 31 '14 at 00:05