4

To get X number of files in a directory, I can do:

$ ls -U | head -40000

How would I then delete these 40,000 files? For example, something like:

$ "rm -rf" (ls -U | head -40000)
David542
  • 104,438
  • 178
  • 489
  • 842
  • Note for others that the `-U` option to `ls` in this example will return directory contents unsorted (as they are "natively" stored). It will also not show dot files. You may want another option depending on your needs. – Adam B Feb 25 '16 at 00:12
  • Better hope you don't have newlines in your filenames, using that first command. – yellowantphil Feb 25 '16 at 01:43

4 Answers4

5

The tool you need for this is xargs. It will convert standard input into arguments to a command that you specify. Each line of the input is treated as a single argument.

Thus, something like this would work (see the comment below, though, ls shouldn't be parsed this way normally):

ls -U | head -40000 | xargs rm -rf

I would recommend before trying this to start with a small head size and use xargs echo to print out the filenames being passed so you understand what you'll be deleting.

Be aware if you have files with weird characters that this can sometimes be a problem. If you are on a modern GNU system you may also wish to use the arguments to these commands that use null characters to separate each element. Since a filename cannot contain a null character that will safely parse all possible names. I am not aware of a simple way to take the top X items when they are zero separated.

So, for example you can use this to delete all files in a directory

find . -maxdepth 1 -print0 | xargs -0 rm -rf
Adam B
  • 3,775
  • 3
  • 32
  • 42
  • 2
    [Parsing `ls`](http://mywiki.wooledge.org/ParsingLs) is an antipattern and should be avoided in any automation. (I realize this was in the question, but readers are likely to skip directly to the answer.) – kojiro Feb 25 '16 at 01:46
  • True. Updated the answer to have that warning before I use `ls` – Adam B Feb 25 '16 at 01:53
2

What about using awk as the filter?

find "$FOLDER" -maxdepth 1 -mindepth 1 -print0 \
    | awk -v limit=40000 'NR<=limit;NR>limit{exit}' RS="\0" ORS="\0" \
    | xargs -0 rm -rf

It will reliably remove at most 40.000 files (or folders). Reliably means regardless of which characters the filenames may contain.


Btw, to get the number of files in a directory reliably you can do:

find FOLDER -mindepth 1 -maxdepth 1 -printf '.' | wc -c 
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
2

Use a bash array and slice it. If the number and size of arguments is likely to get close to the system's limits, you can still use xargs to split up the remainder.

files=( * )
printf '%s\0' "${files[@]:0:40000}" | xargs -0 rm
kojiro
  • 74,557
  • 19
  • 143
  • 201
  • Have you tested that with 40.000 filenames having arbitrary length? Do you think `rm` can consume 40.000 arguments? – hek2mgl Feb 25 '16 at 01:51
  • @hek2mgl no. How shall I produce names with arbitrary length? More seriously, I could test it, and it could work here because I have a particular implementation of `exec`, and it could still fail for OP, who may have a different one. On OS X, ARG_MAX is ~264k. – kojiro Feb 25 '16 at 01:54
  • No, this (a) won't work and (b) even if it would work on a future system it never would scale. I don't wan't to be the smart ass but that's simply the truth. Btw I should have simply used "maximum" instead of "arbitrary" length above. – hek2mgl Feb 25 '16 at 01:56
  • @hek2mgl in what sense would it not scale? – kojiro Feb 25 '16 at 02:00
  • I mean shell variables have a limited capacity. Also the maximum length of a command line is limited. Even if upcoming operating systems increase this limit it will be limited. Conclusion: You can't store that in a bash variable, you can't call rm just once. (I mean reliably). You need to call it as many times as needed to process all arguments, always passing the possible number of arguments allowed to not hit the character limit. Actually this is was `xargs` is doing ;) – hek2mgl Feb 25 '16 at 02:04
  • @hek2mgl thanks, I've rolled xargs into my answer, but I still think shell variable slicing is a good approach. – kojiro Feb 25 '16 at 02:07
  • OK. Actually I'm not sure any more about a limit in the size of bash variables. Will need to investigate that. – hek2mgl Feb 25 '16 at 02:23
  • @hek2mgl https://stackoverflow.com/questions/1078031/what-is-the-maximum-size-of-an-environment-variable-value/1078125#1078125 – kojiro Feb 25 '16 at 02:24
  • I'll give you an upvote on this since it is pretty simply, and as long as my test shows it can work with 40.000[0000...] filenames without a problem as it seems on the first sight. I'm still unsure if it is the right way to do, but at least for 40.000 files it should be ok. (I definitely underestimated that) – hek2mgl Feb 25 '16 at 02:29
  • @hek2mgl fwiw I did a cursory test with 80001 files named `{0..80000}`. (It took much longer to create them than to delete them.) – kojiro Feb 25 '16 at 02:30
0

I ended up doing this since my folders were named with sequential numbers. This should also work for alphabetical folders:

ls -r releases/ | sed '1,3d' | xargs -I {} rm -rf releases/{}

Details:

  • list all the items in the releases/ folder in reverse order
  • slice off the first 3 items (which would be the newest if numeric/alpha naming)
  • for each item, rm it

In your case, you can replace ls -r with ls -U and 1,3d with 1,40000d. That should be the same, I believe.

james2doyle
  • 1,399
  • 19
  • 21