2

I would like to recursively go through all subdirectories and remove the oldest two PDFs in each subfolder named "bak":

Works:

find . -type d -name "bak" \
  -exec bash -c "cd '{}' && pwd" \;

Does not work, as the double quotes are already in use:

find . -type d -name "bak" \
  -exec bash -c "cd '{}' && rm "$(ls -t *.pdf | tail -2)"" \;

Any solution to the double quote conundrum?

Michael Gruenstaeudl
  • 1,609
  • 1
  • 17
  • 31

3 Answers3

3

In a double quoted string you can use backslashes to escape other double quotes, e.g.

find ... "rm \"\$(...)\""

If that is too convoluted use variables:

cmd='$(...)'
find ... "rm $cmd"

However, I think your find -exec has more problems than that.

  • Using {} inside the command string "cd '{}' ..." is risky. If there is a ' inside the file name things will break and might execcute unexpected commands.
  • $() will be expanded by bash before find even runs. So ls -t *.pdf | tail -2 will only be executed once in the top directory . instead of once for each found directory. rm will (try to) delete the same file for each found directory.
  • rm "$(ls -t *.pdf | tail -2)" will not work if ls lists more than one file. Because of the quotes both files would be listed in one argument. Therefore, rm would try to delete one file with the name first.pdf\nsecond.pdf.

I'd suggest

cmd='cd "$1" && ls -t *.pdf | tail -n2 | sed "s/./\\\\&/g" | xargs rm'
find . -type d -name bak -exec bash -c "$cmd" -- {} \;
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • 1
    The `tr` so you can then use `xargs -0` isn't really adding any value here. If you don't know which newlines were part of a file name, you will get *file not found* both ways. – tripleee Jan 16 '21 at 09:09
  • @tripleee Yes and no. I did *not* use `xargs -0` for handling files with linebreaks; that wouldn't work with `ls` anyway. I used `-0` to disable xargs' special treatment of quotes, backslashes and whitespace. With GNU xargs I could have used `-d \\n` instead. But macos' xargs doesn't offer this option. So `-0` is more portable although it might look strange here. The next best alternative would be `... | sed 's/./\\&/g' | xargs rm` I think. – Socowi Jan 16 '21 at 11:17
  • @tripleee How do you know macOS' xargs doesn't have the `-0` option? I couldn't test it, but the [macOS man page for xargs](https://ss64.com/osx/xargs.html) lists `-0` as the very first option. I took the `tr | xargs` from [this answer](https://stackoverflow.com/a/32589977/6770384). There is a comment saying *"`this option is not available`"*, however, *this* has to reference `-d` (look at the answers edit history the date of the comment). Anyways, I edited my answer to use the posix conform `sed | xargs`. – Socowi Jan 16 '21 at 19:48
  • Sorry, I was confused - macOS `xargs` does have the `-0` option. (I was mixing it up with `find` which doesn't have `-print0` so using `xargs -0` for one of its primary use cases is harder on macOS.) – tripleee Jan 18 '21 at 08:42
1

You are explicitely asking for find -exec. Usually I would just concatenate find -exec find -delete but in your case only two files should be deleted. Therefore the only method is running subshell. Socowi already gave nice solution, however if your file names do not contain tabulator or newlines, another workaround is find while read loop.

This will sort files by mtime

find . -type d -iname 'bak' | \
while read -r dir;
  do
    find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | \
    sort | head -n2 | \
    cut -f2- | \
    while read -r file;
      do
        rm "$file";
    done;
done;

The above find while read loop as "one-liner"

find . -type d -iname 'bak' | while read -r dir; do find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | sort | head -n2 | cut -f2- | while read -r file; do rm "$file"; done; done;

find while read loop can also handle NUL terminated file names. However head can not handle this, so I did improve other answers and made it work with nontrivial file names (only GNU + bash)


replace 'realpath' with rm

#!/bin/bash

rm_old () {
  find "$1" -maxdepth 1 -type f -iname \*.$2 -printf "%T+\t%p\0" | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}

export -f rm_old

find -type d -iname bak -execdir bash -c 'rm_old "{}" pdf 2' \;

However bash -c might still exploitable, to make it more secure let stat %N do the quoting

#!/bin/bash

rm_old () {
  local dir="$1"

# we don't like eval
#  eval "dir=$dir"

  # this works like eval
  dir="${dir#?}"
  dir="${dir%?}"
  dir="${dir//"'$'\t''"/$'\011'}"
  dir="${dir//"'$'\n''"/$'\012'}"
  dir="${dir//$'\047'\\$'\047'$'\047'/$'\047'}"

  find "$dir" -maxdepth 1 -type f -iname \*.$2 -printf '%T+\t%p\0' | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}

find -type d -iname bak -exec stat -c'%N' {} + | while read -r dir; do rm_old "$dir" pdf 2; done
alecxs
  • 701
  • 8
  • 17
  • The `while read` could easily and more idiomatically be replaced with `xargs` – tripleee Jan 16 '21 at 08:43
  • @triplee thx have added `-execdir` `xargs` with `find` instead of `ls` for handling nontrivial file names. got the idea from here, there it looks less ugly https://stackoverflow.com/a/26349346 – alecxs Jan 16 '21 at 19:38
1

You have a more fundamental problem; because you are using the weaker double quotes around the entire script, the $(...) command substitution will be interpreted by the shell which parses the find command, not by the bash shell you are starting, which will only receive a static string containing the result from the command substitution.

If you switch to single quotes around the script, you get most of it right; but that would still fail if the file name you find contains a double quote (just like your attempt would fail for file names with single quotes). The proper fix is to pass the matching files as command-line arguments to the bash subprocess.

But a better fix still is to use -execdir so that you don't have to pass the directory name to the subshell at all:

find . -type d -name "bak" \
  -execdir bash -c 'ls -t *.pdf | tail -2 | xargs -r rm' \;

This could stll fail in funny ways because you are parsing ls which is inherently buggy.

tripleee
  • 175,061
  • 34
  • 275
  • 318