2

I was writing a shell script, and was flabbergasted that I couldn't find a portable cross-platform method to get basic file metadata, like: type, modification time, permissions, paths for links, etc. Basically, the same thing that ls outputs, but in a friendly parsable manner.

Reference this post for information about why you shouldn't parse ls: Why not parse ls (and what to do instead)?

Everybody seems to say to use stat or find to accomplish a similar end-result, but again, I was flabbergasted to find that on my two computers (one Ubuntu 18.04, one MacOS X Catalina), I couldn't come up with any common syntax that worked on both systems. I believe that both of these utilities are GNU extensions. Reference this post: How can I get the size of a file in a bash script?

For stat, Ubuntu uses --printf=FORMAT to specify fields. In MacOS X (BSD-based), the syntax is -f format. The names and ordering of the fields are also different, making parsing with RegEx not practical.

For find, Ubuntu has an "action" field available for -printf format. MacOS X simply doesn't have this option or anything comparable that I'm aware of.

So my question is: If ls, stat, and find don't provide a portable solution to get parsable file metadata, how do I do it? Just suck it up and parse ls??? This seems so basic that I can't believe there isn't something that works cross-platform... It doesn't have to be POSIX per-se, just something that's basically ubiquitous for 'Nix OS'es.

I've found two somewhat junky solutions so far, but I'll note them for reference... :

  1. Use rsync as a helper tool, and use its --out-format flag. Use the --dry-run option so it doesn't actually do any file transferring. I tried this on both my Mac and Linux boxes, and it seemed to work, but it's pretty slow/grimy, and I don't know if rsync is considered ubiquitous or not. I found this from some StackOverflow post that I forgot to bookmark. ;-)

  2. Use tar as a helper tool (I know that pax is the new POSIX standard, but it wasn't on my Ubuntu box), pipe the output, and parse its headers only. Discard the rest (e.g. pipe it to /dev/null). I found this idea on this post: Getting file modification time in POSIX shell. So... I definitely consider the tar utility ubiquitous enough (it used to be part of POSIX). And the headers are great for parsing. I'm a little concerned though that it might be reading more file content than I want it to, though.

I tested both methods with a directory containing some huge files (e.g. 70GB) and the tar method definitely isn't reading the whole files, though you can slightly notice that it's slower than ls / stat / find. And the code takes a bit of fancy footwork...

To re-iterate my question, is there a less-grimy way to get file metadata in a portable way, that at least works on OSX and Ubuntu and most 'Nix'es?


REFERENCE MATERIAL - INTERESTED READERS ONLY

Here are code snippets for Method 1 and Method 2 above. I started with the reference posts, and took it to the next step. For demonstration purposes, the code I'm posting below uses the output of each method to print a directory listing that looks like ls. Just for a demo...

Both methods: Grimy, grimy

Method 1 (rsync):

ALT_STAT() {
    ALT_STAT_NAME="${1}"
    set -- $(rsync --dry-run --dirs --ignore-times --links --specials --out-format='%i %B %l %U %G' "${1}" "${1}")
    ALT_STAT_TYPE="${1:1:1}"
    ALT_STAT_PERMS="${2}"
    ALT_STAT_EXEC="$(echo "${2}" | sed -n $'/[xt]/i\\\nexe')"
    ALT_STAT_LABEL="$(echo $'freg\nddir\nLlnk\nDdev\nSspe' | sed -n '/^${ALT_STAT_TYPE}/s/^.//p')"
    [ "${ALT_STAT_LABEL}"="reg" -a "${ALT_STAT_EXEC}"="exe" ] && ALT_STAT_LABEL="exe"
    ALT_STAT_SIZE="${3}"
    ALT_STAT_USER="$(id -un ${4})"
    ALT_STAT_GROUP="${5}"
    ALT_STAT_LINK=$(readlink "${ALT_STAT_NAME}")
    [ "${ALT_STAT_LINK}" -a "${ALT_STAT_LINK:1:1}" != "/" ] && ALT_STAT_LINK="$(PATH="$(pwd):${PATH}" which ${ALT_STAT_LINK})"
    ALT_STAT_MTIME=$(date -r "${ALT_STAT_LINK:-${ALT_STAT_NAME}}" +%s)
    [ "${ALT_STAT_LINK}" ] && ALT_STAT_LINK="--> ${ALT_STAT_LINK}"
}

ALT_LS() {
    for f in *; do
        ALT_STAT "${f}"
        printf "%3.3s | %9.9s | %12.12s | %10.10s | %9.9s | %50.50s | %s\n" \
        "${ALT_STAT_LABEL}" "${ALT_STAT_PERMS}" "${ALT_STAT_USER}" "${ALT_STAT_GROUP}"\
        "${ALT_STAT_SIZE}" "${ALT_STAT_NAME} ${ALT_STAT_LINK}" $(date -r "${ALT_STAT_MTIME}" +%D_%T)
    done
}

Method 2 (tar):

cat > fileMetadata.sh <<\ENDSCRIPT
#!/bin/bash

readTarHeader() {
    read -n 100; name="${REPLY}"
    read -n 8; mode="${REPLY}"
    read -n 8; uid=$((8#${REPLY}))
    read -n 8; gid=$((8#${REPLY}))
    read -n 12; size=$((8#${REPLY}))
    read -n 12; mtime=$((8#${REPLY}))
    read -n 8; checksum=$((8#${REPLY}))
    read -n 1; typeflag="${REPLY}"
    read -n 100; linkname="${REPLY}"
    read -n 6; magic="${REPLY}"
    read -n 2; version="${REPLY}"
    read -n 32; uname="${REPLY}"
    read -n 32; gname="${REPLY}"
    read -n 8; devmajor="${REPLY}"
    read -n 8; devminor="${REPLY}"
    read -n 155; prefix="${REPLY}"
    read -n 12; # Padding

    # Flush buffers (tested on a 78GB file; it was so fast that it can't be reading it)
    cat > /dev/null
}

writeBackHeaders() {
    printf "%6.6s | %1.1s | %16.16s | %8.8s | %9.9s | %19.19s | %10.10s | %35.35s" \
        "${mode}" "${typeflag}" "${uname}" "${gname}" "${size}" $(date -r "${mtime}" +%Y/%m/%d_%H:%M%:%s) "${prefix}" "${name} ${linkname}"
    echo
}

printMetaData() {
    for file in "${@}"; do
        tar --use-compress-program="${0} -t" -c "${file}" | cat 
    done
}

[ "${1}" = "-t" ] && { readTarHeader; writeBackHeaders; exit 0; }

[ "${1}" = "-p" ] || exit -1

[ -d ${2} ] && printMetaData "${2}"* || printMetaData "${2}"

ENDSCRIPT

Usage:

chmod 777 fileMetadata.sh
./fileMetadata.sh -p testDir2/
Sean
  • 393
  • 2
  • 11
  • Related: [POSIX analog of coreutils “stat” command?](https://stackoverflow.com/q/27828585/3266847) – Benjamin W. Apr 21 '21 at 16:33
  • Yeah, the Perl solution in that one doesn't look too bad, though I haven't even checked yet to see if my machines both have it installed. I might be a little less picky about "pure" POSIX than the OP in that thread (sounds like he didn't get the answer he was looking for) – Sean Apr 21 '21 at 17:39
  • "Ubuntu" is a weirdly specific platform to single out. Mainline Linux distributions, including Debian, on which Ubuntu is based, use GNU `coreutils`. There are some specialized distros with different userland utilities (which in the case of Busybox is Busybox itself). – tripleee Apr 18 '22 at 17:57

0 Answers0