2

I'm simplifying a messy Bash script that synchronizes directories and their contents and files among several machines on a network using rsync.

The script gets file names, directory names, and glob patterns from a sources.txt file with contents like (abridged):

go/
Pictures/
Videos/
Android/
.AndroidStudio*/
texmf/
.customization/
.local/share/rhythmbox/
.m2/

My smplified version of the updated script needs to expand globs within sources.txt and present the result to rsync's --files-from=FILE option. To do this, I'm using a helper script (expand-globs-in-file.sh), reproduced below (and process substitution):

#!/bin/bash
if [ -n "$1" ]; then
  cat $1 | tr '\n' '\0' | xargs -i -0 sh -c 'compgen -o bashdefault -G "{}"'
else
  echo "Usage: $0 <filename>"
fi

What I am experiencing is that the script doesn't pass through some filenames (most begin with a period) while executing cat sources.txt | tr '\n' '\0' | xargs -i -0 sh -c 'compgen -o bashdefault -G "{}"' on the command line works as expected.

Directory and file names/glob patterns that fail to be passed through:

.emacs
.gitconfig
.AndroidStudio*/
GPG-KEY-apacifico
network.txt
.jq

The following fle and directory names pass through correctly:

Documents/
Finances/
Downloads/
Development/
.android/
go/
Pictures/
Videos/
.cellphone/
Android/
texmf/
.customization/
.local/share/rhythmbox/
.m2/
useful-scripts/
rpmbuild/
texmf/

What am I doing wrong? I'm suspicious this is a quoting problem, but I don't see it.

Al Pacifico
  • 800
  • 5
  • 17
  • 1
    Are you _sure_ you need to expand those globs before passing them to `rsync` ? I'm only saying this because it offers very nice globbing itself, including `**/*.txt` recursive globs. Also, why are we using `compgen` ? Does it maybe have some nice properties for this use case? I would have thought `find` would be the natural fit. – J_H Dec 15 '22 at 06:02
  • @J_H: The old script used rsync's globbing and there were problems, for instance it didn't expand any dotfiles within directories (see [https://tldp.org/LDP/abs/html/globbingref.html](https://tldp.org/LDP/abs/html/globbingref.html)). The old script had a for loop that iterated a list of generated source and destination directories generated from `sources.txt` and a large directory or file could cause a hang (this behavior actually appeared a year or two ago, was resolved with an rsync update, and now has returned). I believe a solution using find will have the same behavior. – Al Pacifico Dec 15 '22 at 09:07
  • Cool. I was essentially asking if you've thought about the tradeoffs and the answer is you have, compgen is your preference. Not mine, due to globbing irregularities, but hey, to each his own. Find lacks regex so I tend to start with `find . -type f`, maybe tack on `-name '*.xxx'`, then pipe it through `grep` for the real filtering before `xargs` does the work. Getting a glob to recognize symlinks seems hard, and the "quick" rules impressed me as nightmarish. I _do_ routinely use `set -euo pipefail` though, for sanity. As I read through those docs, it seems that maybe `set dotglob` would help? – J_H Dec 15 '22 at 15:39
  • Also, from an architectural design perspective, a backup solution that breaks the job into two tasks, (1.) create manifest, (2.) transfer files listed in manifest, can be attractive. Manifest has full pathnames (compresses nicely when sorted), plus maybe the stat() file length, type, mtime date, for extra credit a content hash. It is "compact", conveniently archived daily like syslog rotation, and best of all lets the receiving end check your work, both immediately and hours later, verifying it's a good backup. There are races, but they're easily detected via "recent" timestamp. – J_H Dec 15 '22 at 15:45
  • @J_H: Embarrassingly, it seems to be working fine today and I have no idea why it was not last night. Thanks for your input. I had forgotten about dotglob, but it was unneeded. – Al Pacifico Dec 15 '22 at 20:19

0 Answers0