I was writing a shell script, and was flabbergasted that I couldn't find a portable cross-platform method to get basic file metadata, like: type, modification time, permissions, paths for links, etc. Basically, the same thing that ls
outputs, but in a friendly parsable manner.
Reference this post for information about why you shouldn't parse ls
:
Why not parse ls
(and what to do instead)?
Everybody seems to say to use stat
or find
to accomplish a similar end-result, but again, I was flabbergasted to find that on my two computers (one Ubuntu 18.04, one MacOS X Catalina), I couldn't come up with any common syntax that worked on both systems. I believe that both of these utilities are GNU extensions. Reference this post: How can I get the size of a file in a bash script?
For stat
, Ubuntu uses --printf=FORMAT
to specify fields. In MacOS X (BSD-based), the syntax is -f format
. The names and ordering of the fields are also different, making parsing with RegEx not practical.
For find
, Ubuntu has an "action" field available for -printf format
. MacOS X simply doesn't have this option or anything comparable that I'm aware of.
So my question is:
If ls
, stat
, and find
don't provide a portable solution to get parsable file metadata, how do I do it? Just suck it up and parse ls??? This seems so basic that I can't believe there isn't something that works cross-platform... It doesn't have to be POSIX per-se, just something that's basically ubiquitous for 'Nix OS'es.
I've found two somewhat junky solutions so far, but I'll note them for reference... :
Use
rsync
as a helper tool, and use its--out-format
flag. Use the--dry-run
option so it doesn't actually do any file transferring. I tried this on both my Mac and Linux boxes, and it seemed to work, but it's pretty slow/grimy, and I don't know ifrsync
is considered ubiquitous or not. I found this from some StackOverflow post that I forgot to bookmark. ;-)Use
tar
as a helper tool (I know thatpax
is the new POSIX standard, but it wasn't on my Ubuntu box), pipe the output, and parse its headers only. Discard the rest (e.g. pipe it to/dev/null
). I found this idea on this post: Getting file modification time in POSIX shell. So... I definitely consider thetar
utility ubiquitous enough (it used to be part of POSIX). And the headers are great for parsing. I'm a little concerned though that it might be reading more file content than I want it to, though.
I tested both methods with a directory containing some huge files (e.g. 70GB) and the tar
method definitely isn't reading the whole files, though you can slightly notice that it's slower than ls
/ stat
/ find
. And the code takes a bit of fancy footwork...
To re-iterate my question, is there a less-grimy way to get file metadata in a portable way, that at least works on OSX and Ubuntu and most 'Nix'es?
REFERENCE MATERIAL - INTERESTED READERS ONLY
Here are code snippets for Method 1 and Method 2 above. I started with the reference posts, and took it to the next step. For demonstration purposes, the code I'm posting below uses the output of each method to print a directory listing that looks like ls
. Just for a demo...
Both methods: Grimy, grimy
Method 1 (rsync
):
ALT_STAT() {
ALT_STAT_NAME="${1}"
set -- $(rsync --dry-run --dirs --ignore-times --links --specials --out-format='%i %B %l %U %G' "${1}" "${1}")
ALT_STAT_TYPE="${1:1:1}"
ALT_STAT_PERMS="${2}"
ALT_STAT_EXEC="$(echo "${2}" | sed -n $'/[xt]/i\\\nexe')"
ALT_STAT_LABEL="$(echo $'freg\nddir\nLlnk\nDdev\nSspe' | sed -n '/^${ALT_STAT_TYPE}/s/^.//p')"
[ "${ALT_STAT_LABEL}"="reg" -a "${ALT_STAT_EXEC}"="exe" ] && ALT_STAT_LABEL="exe"
ALT_STAT_SIZE="${3}"
ALT_STAT_USER="$(id -un ${4})"
ALT_STAT_GROUP="${5}"
ALT_STAT_LINK=$(readlink "${ALT_STAT_NAME}")
[ "${ALT_STAT_LINK}" -a "${ALT_STAT_LINK:1:1}" != "/" ] && ALT_STAT_LINK="$(PATH="$(pwd):${PATH}" which ${ALT_STAT_LINK})"
ALT_STAT_MTIME=$(date -r "${ALT_STAT_LINK:-${ALT_STAT_NAME}}" +%s)
[ "${ALT_STAT_LINK}" ] && ALT_STAT_LINK="--> ${ALT_STAT_LINK}"
}
ALT_LS() {
for f in *; do
ALT_STAT "${f}"
printf "%3.3s | %9.9s | %12.12s | %10.10s | %9.9s | %50.50s | %s\n" \
"${ALT_STAT_LABEL}" "${ALT_STAT_PERMS}" "${ALT_STAT_USER}" "${ALT_STAT_GROUP}"\
"${ALT_STAT_SIZE}" "${ALT_STAT_NAME} ${ALT_STAT_LINK}" $(date -r "${ALT_STAT_MTIME}" +%D_%T)
done
}
Method 2 (tar
):
cat > fileMetadata.sh <<\ENDSCRIPT
#!/bin/bash
readTarHeader() {
read -n 100; name="${REPLY}"
read -n 8; mode="${REPLY}"
read -n 8; uid=$((8#${REPLY}))
read -n 8; gid=$((8#${REPLY}))
read -n 12; size=$((8#${REPLY}))
read -n 12; mtime=$((8#${REPLY}))
read -n 8; checksum=$((8#${REPLY}))
read -n 1; typeflag="${REPLY}"
read -n 100; linkname="${REPLY}"
read -n 6; magic="${REPLY}"
read -n 2; version="${REPLY}"
read -n 32; uname="${REPLY}"
read -n 32; gname="${REPLY}"
read -n 8; devmajor="${REPLY}"
read -n 8; devminor="${REPLY}"
read -n 155; prefix="${REPLY}"
read -n 12; # Padding
# Flush buffers (tested on a 78GB file; it was so fast that it can't be reading it)
cat > /dev/null
}
writeBackHeaders() {
printf "%6.6s | %1.1s | %16.16s | %8.8s | %9.9s | %19.19s | %10.10s | %35.35s" \
"${mode}" "${typeflag}" "${uname}" "${gname}" "${size}" $(date -r "${mtime}" +%Y/%m/%d_%H:%M%:%s) "${prefix}" "${name} ${linkname}"
echo
}
printMetaData() {
for file in "${@}"; do
tar --use-compress-program="${0} -t" -c "${file}" | cat
done
}
[ "${1}" = "-t" ] && { readTarHeader; writeBackHeaders; exit 0; }
[ "${1}" = "-p" ] || exit -1
[ -d ${2} ] && printMetaData "${2}"* || printMetaData "${2}"
ENDSCRIPT
Usage:
chmod 777 fileMetadata.sh
./fileMetadata.sh -p testDir2/