Recursively delete all files except a certain number in each directory

Question

I have a large collection of files contained in directories for testing. I need to keep the directory structure for my application but want to thin out the files for faster testing. I want to limit the number of files a directory can have to 3. How can I do that in linux?

To clarify what I would like to accomplish, a solution in Python:

import sys, os
for root, dirs, files in os.walk(sys.argv[1]):
    for index, file in enumerate(files):
        if index > int(sys.argv[2]) - 1: os.remove(os.path.join(root, file))

Usage:

python thinout.py /path/to/thin\ out/ <maximum_number_of_files_per_directory>

Example:

python thinout.py testing\ data 3

I found a smiliar question about doing this for one directory, but not recursively.

Matt · Accepted Answer · 2013-01-24T04:25:31.603

2

I would do something like this in bash:

for dir in `find . -type d`; pushd $dir; rm `ls | awk 'NR>3'`; popd; done;

Or this version might be better:

for dir in `find . -type d`; pushd $dir; rm `find . -maxdepth 1 -type f | tail -n +3`; popd; done;

Of course - just randomly deleting all but the first 3 files in the directory is always a little risky. Buyer beware...

By the way, I did not test this myself. Just typed in what came to mind. You'll likely have to tweak it a little to get it to work right. Again, buyer beware.

edited Jan 24 '13 at 04:25

answered Jan 24 '13 at 02:39

Matt

111
3

Don't you need a "do " after the first semi-colon? – EJK Jan 24 '13 at 02:42
As @Perleone implies, this version won't work with directory names that need escaping (e.g., names with embedded spaces). – mklement0 Jan 24 '13 at 03:06

score 0 · Answer 2 · edited May 23 '17 at 10:34

0

This quite lengthy sequence will work with files containing spaces etc., and just leave the first three alphabetically sorted files in each subdir.

EDIT: applied mklement's improvement to cope with directories that need escaping.

find /var/testfiles/ -type d -print0 | while IFS= read -r -d '' subdir; \
do cd "$subdir"; find . -mindepth 1 -maxdepth 1 -type f -print0 | \
sort --zero-terminated | tr '\0' '\n' | tail -n+4 | tr '\n' '\0' | \
xargs --null --no-run-if-empty rm ; cd "$OLDPWD" ; done

Since my version of tail doesn't support a --zero or --null flag for line terminators, I had to work around that with tr. Suggestions for improvements are welcome.

edited May 23 '17 at 10:34

Community

1
1

answered Jan 24 '13 at 02:55

Perleone

3,958
1
26
26

Your directory loop won't work with directory names that need escaping. Use something like `find /var/testfiles/ -type d -print0 | while IFS= read -r -d '' subdir; do cd "$subdir"; ...`; see http://mywiki.wooledge.org/BashFAQ/001 – mklement0 Jan 24 '13 at 04:53
@mklement I can not get the command to run with your modification. Could you perhaps make an edit or post your version as an individual answer? – Bengt Jan 24 '13 at 23:58
@Perlene Your answer does not work for me. The cause seems to be whitespaces in the path. – Bengt Jan 25 '13 at 00:03
@bngtlrs I've amended my answer, does the new version work for you? – Perleone Jan 25 '13 at 00:20
@bngtlrs @Perleone's revised version looks OK. If you don't have to deal with filenames with embedded `\n` chars. (which is rare), here's a more concise version: `find "/var/testfiles" -type d | while read -r subdir; do (find "$subdir" -maxdepth 1 -type f | sort | tail -n +4 | while read -r f; do rm "$f"; done); done`. An added advantage of this version is that it can deal with a large number of files to delete, whereas the `xargs`-based version could exceed the max. length of a single command line. – mklement0 Jan 25 '13 at 04:12

Recursively delete all files except a certain number in each directory

2 Answers2