What would be the fastest way to find and remove files?

Question

With:

Find="$(find / -name "*.txt" )"
du -hc "$Find" | tail -n1
echo "$Find" | xargs rm -r

If the file foo bar.txt is found, it won't count it with du or remove the file. What would be the best way to escape the spaces?

Set `IFS` to newline, so that spaces in the variable expansion won't be used as word delimiters. — Barmar, Feb 04 '16 at 21:23
`find -delete` is the fastest; [see this](http://stackoverflow.com/questions/33488712/delete-nested-hidden-files-with-egrep-and-xargs/33488763#33488763). — Kenney, Feb 04 '16 at 21:25
@Cyrus: Thanks for cleaning up the other comment, but if you want to retain the shellcheck.net comment, please _recreate it_ to state that _it no longer directly applies_, and that you're simply recommending shellcheck.net as a generally helpful resource. — mklement0, Feb 04 '16 at 22:20
...in general, "escaping" spaces or other content inside data (as opposed to code) is the Wrong Thing: If you're escaping them to be safe against parsing, that means you're running your data through a parser, when the best-practice approaches don't do that at all. — Charles Duffy, Feb 04 '16 at 22:48
...it's much the same thing as how trying to escape values that are going into SQL is The Wrong Thing, as opposed to the correct practice of passing them out-of-band (ie. via bind parameters). — Charles Duffy, Feb 04 '16 at 22:49

mklement0 · Accepted Answer · 2016-02-04T22:17:01.053

If none of your filenames can have embedded newlines (which would be very unusual), you can use the following:

^{Note: To prevent accidental deletion of files while experimenting with the commands, I've replaced / as the input dir. (as used in the question) with /foo.}

# Read all filenames into a Bash array; embedded spaces in
# filenames are handled correctly.
IFS=$'\n' read -d '' -ra files < <(find /foo -name "*.txt")

# Run the `du` command:
du -hc "${files[@]}" | tail -1

# Delete the files.
rm -r "${files[@]}"

Note that if you didn't need to collect all filenames ahead of time and don't mind running find twice, you can use a single find command for each task (except for piping to tail), which is also the most robust option (the only caveat is that if you have so many files that they don't fit on a single command line, du could be invoked multiple times).

# The `du` command
find /foo -name "*.txt" -exec du -hc {} + | tail -n1

# Deletion.
# Note that both GNU and BSD `find` support the `-delete` primary,
# which supports deleting both files and directories.
# However, `-delete` is not POSIX-compliant (a POSIX-compliant alternative is to
# use `-exec rm -r {} +`).
find /foo -name "*.txt" -delete

Using + to terminate the command passed to -exec is crucial, as it instructs find to pass as many matches as will fit on a single command line to the target command; typically, but not necessarily, this results in a single invocation; effectively -exec ... + is like a built-in xargs, except that embedded whitespace in arguments is not a concern.

In other words: -exec ... + is not only more robust than piping to xargs, but - due to not needing a pipeline and another utility - also more efficient.

I agree that embedded newlines are rare, but what's the point of an approach that's fragile in the face of them? Is the goal compatibility with POSIX find? — Charles Duffy, Feb 04 '16 at 22:46
@CharlesDuffy: The stand-alone `find` commands avoid the embedded newline issue, in a manner suggested in your helpful comments on the question (although a _single_ invocation of `du` is not guaranteed). Otherwise, it's a tradeoff between between convenience / efficiency and covering _all_ edge cases. Can you think of a fully robust way to collect all filenames without a loop and without involving a temporary file? — mklement0, Feb 04 '16 at 23:03

BRPocock · Answer 2 · 2016-02-04T22:37:53.230

0

Perhaps find / -name '*.txt' -exec du -hc {} \; is more like what you're looking for?

But, doing it as you did, you're missing quotes in your call to du, and needlessly using xargs when it won't work… You seem enamoured of echo, who is not your friend.

Since \0 isn't allowed in filenames, you can safely collect results from find using its -print0 option:

 date > /private/var/mobile/Documents/Local\ Cookies/Clean

 find . -print0 | while IFS='' read -r -d '' file
 do
      du -hc "$file" | tail -n 1
      rm "$file"
 done

Corrected should work on MacOS and Linux now.

edited Feb 04 '16 at 22:37

answered Feb 04 '16 at 21:27

BRPocock

13,638
3
31
50

To get `du -hc` to work as intended (report the _total_ size), you must pass _all_ filenames to it; hence: `-exec du -hc {} +`. – mklement0 Feb 04 '16 at 21:36
Your loop solution, aside from being inefficient, has the same problem: invoking `du -hc` on _individual_ files is not what the OP intended. Also, you probably forgot to remove `xargs` before the `rm` command. – mklement0 Feb 04 '16 at 21:54
right, cut-n-paste error. And … yeah, I could interpret the intent either way, as far as `du`. The `rm -r` has me puzzled, as well … unless someone has directories named *.txt? – BRPocock Feb 04 '16 at 22:38
Re `rm -r`: strange, indeed; as for `du`: from the OP's own solution attempt it's obvious to me that the intent is to _sum up all file sizes_ and report the _total_ size (byte count). – mklement0 Feb 04 '16 at 22:41

What would be the fastest way to find and remove files?

2 Answers2