1

Using Bash.

I have an exported shell function which I want to apply to many files.

Normally I would use xargs, but the syntax like this (see here) is too ugly for use.

...... | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}

In that discussion, parallel had an easier syntax:

..... | parallel -P 10 echo_var {}

Now I have run into the following problem: the list of files to which I want to apply my function is a list of files on one line, each quoted and separated by spaces thus: "file 1" "file 2" "file 3".

how can I feed this space-separated, quoted, list into parallel?

I can replicate the list using echo for testing.

e.g.

echo '"file 1" "file 2" "file 3"'|parallel -d " " my_function {}

but I can't get this to work.

How can I fix it?

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Tim
  • 291
  • 2
  • 17
  • No matter what tool you're using, a NUL-delimited list is the best choice for storing a list of arbitrary arguments or filenames, as the NUL character is the only one that can't be used in a filename or UNIX argument (as those consist of C strings). Using it, no escape or quote characters are needed, so you don't need to worry about how your code handles files that contain those characters in their names. – Charles Duffy Feb 20 '20 at 23:31
  • I would *hope*, then, than `parallel` would support `-0`, just as `xargs` and other competing tools do. Assuming it does, you can run `printf '%s\0' "file 1" "file 2" "file 3" | parallel -0 ...`. – Charles Duffy Feb 20 '20 at 23:31
  • BTW, you might see the mailing list thread starting at https://lists.gnu.org/archive/html/bug-parallel/2015-05/msg00005.html for background on why some folks might consider the "simple" behavior you're referring to here to be deeply undesirable -- even moreso than the syntax required to safely use xargs. The xargs syntax is a mouthful, sure, but it's honest and obvious about how it's executed; parallel has a lot of heuristics and magic, which can lead to surprising results when those heuristics don't do the right thing. – Charles Duffy Feb 20 '20 at 23:34
  • I tried the nul-delimited option but this breaks another part of the process with `command substitution: ignored null byte in input` – Tim Feb 20 '20 at 23:42
  • That's only a problem *if you try to capture the text with the NULs* into a string variable. Don't do that -- the whole thing that makes NUL-delimited strings useful (for storing collections of filenames, arguments, environment variables, or other arbitrary C strings) is that they *can't* be stored in C strings themselves. When you want to store such a list, store the items it would contain in an array instead, and then expand it with `printf '%s\0' "${array[@]}"` to recreate the stream immediately when you need the list ready-for-use. – Charles Duffy Feb 20 '20 at 23:46
  • That is to say, *don't* run something like `content=$(printf '%s\0' *.txt); echo "$content" | xargs -0 ...`; *do* run something like `files=( *.txt ); printf '%s\0' "${files[@]}" | xargs -0 ...` – Charles Duffy Feb 20 '20 at 23:49
  • I'll give it a go. – Tim Feb 20 '20 at 23:53
  • BTW, `-I {}` in xargs implies `-n 1`, so you don't need both. Personally, I'd take out the `-I {}`, and use `xargs -n 1 -P 10 bash -c 'echo_var "$@"' _`. Or you can opt to process multiple inputs per shell invocation; as in `xargs -n 5 -P 10 bash -c 'for arg; do echo_var "$arg"; done' _`, where each shell runs `echo_var` up to five times, and up to 10 shells are running at once; that way you amortize the individual shell's startup cost, at the expense of potentially having uneven load between the instances. – Charles Duffy Feb 21 '20 at 00:03
  • Still having problems with nuls and creating commands. I've added a question with this focus here: https://stackoverflow.com/questions/60368951/allow-user-to-complete-parallel-xargs-command-function-after-selecting-files – Tim Feb 24 '20 at 02:52

2 Answers2

1

How can I fix it?

You have to choose a unique separator.

echo 'file 1|file 2|file 3' | xargs -d "|" -n1 bash -c 'my_function "$@"' --
echo 'file 1^file 2^file 3' | parallel -d "^" my_function

The safest is to use zero byte as the separator:

echo -e 'file 1\x00file 2\x00file 3' | xargs -0 ' -n1 bash -c 'my_function "$@"' --
printf "%s\0" 'file 1' 'file 2' 'file 3' | parallel -0 my_function

The best is to store your elements inside a bash array and use a zero separated stream to process them:

files=("file 1" "file 2" "file 3")
printf "%s\0" "${files[@]}" | xargs -0 -n1 bash -c 'my_function "$@"' --
printf "%s\0" "${files[@]}" | parallel -0 my_function

Note that empty arrays will run the function without any arguments. It's sometimes preferred to use -r --no-run-if-empty option not to run the function when input is empty. The --no-run-if-empty is supported by parallel and is a gnu extension in xargs (xargs on BSD and on OSX do not have --no-run-if-empty).

Note: xargs by default parses ', " and \. This is why the following is possible and will work:

echo '"file 1" "file 2" "file 3"' | xargs -n1 bash -c 'my_function "$@"' --
echo "'file 1' 'file 2' 'file 3'" | xargs -n1 bash -c 'my_function "$@"' --
echo 'file\ 1 file\ 2 file\ 3' | xargs -n1 bash -c 'my_function "$@"' --

And it can result in some strange things, so remember to almost always specify -d option to xargs:

$ # note \x replaced by single x
$ echo '\\a\b\c' | xargs
\abc
$ # quotes are parsed and need to match
$ echo 'abc"def' | xargs
xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
$ echo "abc'def" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option

xargs is a portable tool available quite everywhere, while parallel is a GNU program, which has to be installed separately.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • and @charles-duffy I think I'm almost there: I have the file names in an null-separated array. i want to allow the user to 'construct' the command after `parallel`, . Thus my code ends in `cmd="printf '%s\0' ${y[@]} | parallel -0 "` `read -e -i "$cmd"; eval "$REPLY"` but this is not working - I seem to get a zero printed between each file – Tim Feb 21 '20 at 06:11
  • Do a function. `cmd() { printf "%s\0" "$@" | parallel -0 sometthing; }` `read something` `cmd "$something1" "$something2" "$something3"` – KamilCuk Feb 21 '20 at 11:24
1

The problem boils down to the values can contain space, and space is the value separator. So we need something that can parse the input into separate values containing space. Since they are bash-quoted the obvious choice is to use bash for unquoting the values.

You have several options:

(echo "file 1";
 echo "file  2";
 echo "file \"name\" \$(3)") | parallel my_function

printf "%s\n" "file 1" "file  2" "file \"name\" \$(3)" |
  parallel my_function

If the input is in a variable:

var='"file 1" "file  2" "file \"name\" \$(3)"'
eval 'printf "%s\n" '"$var" |
  parallel my_function

Or you can convert the variable to an array:

var='"file 1" "file  2" "file \"name\" \$(3)"'
eval arr=("$var")

And if the input is in an array:

parallel my_function ::: "${arr[@]}"
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • The input is in array, but I would like the `my_function` to be entered by the used after the files have been selected; I didn't express that clearly in my question, so I have posed it more clearly here: https://stackoverflow.com/questions/60368951/allow-user-to-complete-parallel-xargs-command-function-after-selecting-files – Tim Feb 24 '20 at 02:51