1

The following executes do_x.sh on all files of type a_???.json recursively. However, I would like to do the same on a subset of these files that do not have a corresponding file with the same name and different extension.

 find $PWD -type f -name "a_???.json"  | xargs -I{} do_x.sh {}

How do I say in the same one-liner, do the same but only on the files a_???.json that do not have a corresponding a_???.json done? The following is not a solution for sure:

find $PWD -type f -name "a_???.json" -exclude "a_???.json.done" | xargs -I{} do_x.sh {}
Example
a_111.json
a_112.json
a_111.json.done

So, execute do_x.sh only on a_112.json

Tims
  • 627
  • 7
  • 19
  • 2
    Can you put that logic into `do_x.sh`? – Barmar Aug 28 '23 at 21:58
  • The intent is to skip sending them to compute nodes on cluster: find $PWD -type f -name "a_???.json" | xargs -I{} -n1 -P10 srun -N1 -A goc -p slurm do_x.sh {} – Tims Aug 28 '23 at 22:13
  • 3
    Asking for a one liner will discourage many people from trying to help you since it tells us you favor brevity over robustness, clarity, efficiency, portability, and everything else that actually matters in software and so are likely to accept the briefest answer rather than a good one. You might want to [edit] your subject line and description to remove "one liner" if you actually just want a good solution. – Ed Morton Aug 28 '23 at 22:36

3 Answers3

2

To keep the same structure of your script, try this :

find $PWD -type f -name "a_???.json" -execdir test '!' -f {}.done \; -print | xargs -I{} do_x.sh {}
Philippe
  • 20,025
  • 2
  • 23
  • 32
  • How would this change if srun is added: find $PWD -type f -name "a_???.json" | xargs -I{} -n1 -P10 srun -N1 -A goc -p slurm do_x.sh {} – Tims Aug 28 '23 at 22:15
  • 1
    You don't actually need `bash -c` here, since `test` is available as an external command: `find "$PWD" -type f -name "a_???.json" -exec test '!' -f {}.done \; -print | xargs ...` (note: I single-quoted the exclamation mark to prevent bash from mistaking it for a history expansion). And @Tims none of these changes to the `find` command affect how you process the resulting file list with `xargs` and/or `do_x.sh` -- they *only* change the output of `find` to exclude files that have a corresponding .done file. – Gordon Davisson Aug 28 '23 at 22:41
  • @GordonDavisson I noticed that yesterday, but didn't have time to fix it. – Philippe Aug 29 '23 at 05:53
1

Execute a shell if statement in xargs.

find "$PWD" -type f -name "a_???.json" -exec bash -c 'if ! [ -f "$1.done" ]; then do_x.sh "$1"; fi' {} {} \;

There's no need to use xargs, you can use the -exec keyword to find to execute commands.

Since -exec doesn't use the shell to execute the command, you have to execute bash -c explicitly.

For your more complex command, it's similar. Use xargs to get the parallel operation, then put your full command in the then clause of if.

find "$PWD" -type f -name "a_???.json" | 
    xargs -I{} -n1 -P10 bash -c 'if ! [ -f "$1.done" ]; then srun -N1 -A goc -p slurm do_x.sh "$1"; fi' {} {} \;
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • What are the two {} {} for ? – Tims Aug 28 '23 at 22:14
  • 2
    I was hoping you wouldn't ask. In some versions of `bash`, the first argument after `-c ` becomes `$0`, so you need two of them to fill in `$1`. I was trying to find another question that answers this, but failed. – Barmar Aug 28 '23 at 22:15
  • How would this change if srun is added: find $PWD -type f -name "a_???.json" | xargs -I{} -n1 -P10 srun -N1 -A goc -p slurm do_x.sh {} – Tims Aug 28 '23 at 22:20
1

You can use xargs to run shell code via the command bash -c. This allows you to process multiple find hits with the same command, which may provide a noticeable performance improvement if you have a lot of files:

find "$PWD" -type f -name "a_???.json" -print0 |
   xargs -0 -r bash -c 'for f; do [[ -e "${f}.done" ]] || do_x.sh "$f"; done' bash

I have split that across two lines for legibility, but it is a single pipeline.

Note that the for loop without explicit arguments items runs over the positional parameters, which is how the shell will receive the filenames from find.

Note also that the trailing bash is intentional and necessary, or at least something is necessary at that position, else the first fileame emitted by find will be consumed for use as the $0 of the shell in which the command runs.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • How would this change if srun is added: find $PWD -type f -name "a_???.json" | xargs -I{} -n1 -P10 srun -N1 -A goc -p slurm do_x.sh {} – Tims Aug 28 '23 at 22:20
  • @Tims, I don't see why the `srun` would make a particular difference itself. It and all its arguments could be put into the `bash` command without issue. But the `-P10` option to `xargs` is a different matter. The command presented here will run all the `do_sh` commands in sequence, just as the example presented in the question does. If you want to parallelize with `-P` then the alternatives presented in other answers are better suited to your needs. – John Bollinger Aug 28 '23 at 22:43