complicated find on bash

Question

I have the following task: delete old "builds" older than 30 days. And this solution works perfectly:

find $jenkins_jobs -type d -name builds -exec find {} -type d -mtime +30 \; >> $filesToBeDelete
cat $filesToBeDelete | xargs rm -rf

But later some condition were added: delete only in case when we have more than 30 builds and clean the oldest ones. So in results we should keep 30 newest build and delete rest.

Also I have found that I can use if statement in find like that:

if [ $(find bla-bla | wc -l) -gt 30 ]; then
...
fi

but I am wandering how can I delete that files.

Is it clear? For example we have in "build" folder 100 builds and all of them are older than 30 days. So I want to keep 30 new builds and delete another 70.

Yes it is clear, but I think I have bad news for you. I don't think `find` is up to the task. `find` operates on a per-file basis (other than possibly filling a command line with multiple filenames) and can't compare different hits, unless there are more advanced features of `find` that I've never heard of. I think you'll need to manually sort the timestamps (unless you resort to some hacky, not so safe solutions) then carry out your logic, which is much easier in Python or Perl. — 4ae1e1, Dec 03 '15 at 14:37
Also, your original solution that "works perfectly" isn't safe; you should use `-exec rm -rf '{}' +` or `find blah blah -print0 | xargs -0` (if you have a more primitive `find`), because `find` output by default should not be parsed (try a filename with a newline, for example). — 4ae1e1, Dec 03 '15 at 14:40
@4ae1e1 I'd assume it works perfectly because they have sane filenames so there is no reason to use them. — 123, Dec 03 '15 at 14:42
Actually I should use bash because it's just a small piece of big script. If it possible to perform in more than one step? I mean first of all sort, later pick up older files if there are more than 30 builds...etc — Volodymyr, Dec 03 '15 at 14:42
@123 I've heard of people having sane filenames and happily working shell scripts, until one day they have some broken program dumping random crap (with random names). Then all is sad. — 4ae1e1, Dec 03 '15 at 14:48
By the way, the answer by gilhad below is but one the "hacky, not so safe" type of solutions I was pointing to. I can't think of a reliable way to use `sort` in this case, because filenames just don't have to fit in a line, and `find` itself even replaces suspicious characters with `?`. — 4ae1e1, Dec 03 '15 at 14:52
I don't have time to write an attempt at an answer up but using [this answer](http://stackoverflow.com/a/25578277/258523) and adding a check on the timestamp being older than your target stamp and then expanding all but the last 30 entries in the sorted array *should* do what you want. — Etan Reisner, Dec 03 '15 at 15:08
@4ae1e1 Why would you want your script to run with random crap? I'd rather be alerted that my program isn't working. — 123, Dec 03 '15 at 15:30
@123 Unless you are "alerted" that your program isn't working by big chunks (or all) of your filesystem getting `rm`ed or something, right? — Jeff Y, Dec 03 '15 at 16:02
@JeffY Nah, just put a check in the script at the start to make sure no dodgy filenames, instead of making the rest of the script more complicated than it needs to be. — 123, Dec 03 '15 at 16:04

score 2 · Accepted Answer · answered Dec 03 '15 at 15:22

Pretty hacky but should be pretty robust for weird filenames

find -type d -name "builds" -mtime +30 -printf "%T@ %p\0" |\
awk -vRS="\0" -vORS="\0" '{match($0,/([^ ]* )(.*)/,a);b[a[2]]=a[1];c[a[1]]=a[2]}END{x=asort(b);for(i=x-30;i>0;i--)print c[b[i]]}' |\
xargs -0 -I{} rm -r {}

I tested with echo and it seems to work but i'd make sure it's showing the right files before using rm -r.

So what it does is passes null terminated strings through so filenames are preserved.

The main limitation is that if two files were created in the same second then it will miss one as it uses an associative array.

Thanks. I've checked this in my environment and it works like a charm. But btw I am interested how awk works here. Could you please explain a little bit deeply? — Volodymyr, Dec 04 '15 at 16:48

Jeff Y · Answer 2 · 2015-12-03T17:05:18.090

0

Here is a relatively safe answer to list the dirs, if your stat is close enough to mine (cygwin/bash):

now=$(date +%s)
find $jenkins_jobs -type d -name builds -exec find {} -type d |
  while read f; do stat -c'%Y %n' "$f"; done |
  sort -nr |
  tail -n +31 |
  awk $now'-$1>2592000'|
  sed 's/^[0-9]* //'

This is working with epoch time (seconds since 1970) as provided by the %s of date and the %Y of stat. The sort and tail are removing the newest 30, and the awk is removing any 30 days old or newer. (2592000 is the number of seconds in 30 days.) The final sed is just removing what stat added, leaving only the dirname.

edited Dec 03 '15 at 17:05

answered Dec 03 '15 at 15:55

Jeff Y

2,437
1
11
18

Literally as bad as the other answer, all your commands delimit records on newlines. – 123 Dec 03 '15 at 16:06
Updated to handle spaces and globs in filenames (by quoting). – Jeff Y Dec 03 '15 at 17:06
1

Needs `IFS=` as well to handle leading and trailing spaces in filenames and `read -r` to handle backslash sequences in filenames. And that still leaves newlines as a problem (which can't be solved with this approach). Well... newlines *might* be possible with `sort -z` and `awk`/etc. instead of `tail` and `sed`. – Etan Reisner Dec 03 '15 at 18:29
OP didn't indicate any such dirname oddness. And perfection is the enemy of the good. – Jeff Y Dec 03 '15 at 18:53

score -2 · Answer 3 · answered Dec 03 '15 at 14:49

-2

This will list all, but 30 newest directoiries.

find -type d -name builds -exec ls -d -l --time-style="+%s" {} \;|sed "s#[^ ]\+ \w\+ \w\+ \w\+ \w\+ ##"|sort -r |sed "s#[^ ]\+ ##"|tail -n +31

after you are sure you want to remove them, you can use the | xargs rm -rf

It reads this way:

find all build dirs
list them with time from epoch
drop (sed - away) rights, user, group atc, leaving only time and name
sort by time from newest
drop those times
tail will show everything from 31. entry (so skip 30 newest)

answered Dec 03 '15 at 14:49

gilhad

609
1
5
22

sorry, but I think I need more deeply explanation: what sed is doing there? – Volodymyr Dec 03 '15 at 14:53
2

This is not good. It tries to parse `ls -l` output, and it doesn't address the additional "only older than 30 days" criterion. – Jeff Y Dec 03 '15 at 14:58
@VolodymyrRykhva the first sed strips all from ls output, except time and name. The second strips the time. [^ ]\+ means anything but space(as many as possible, at least one), \w\+ means word chars (as many as is there, at least one) – gilhad Dec 03 '15 at 15:20
1

@JeffY if also 30 days are still in game, than the find can be given more parameters (like -mtime +30) – gilhad Dec 03 '15 at 15:22
If the `find` returns older than 30 days, and the `tail` happens after, you don't preserve "the latest 30", you preserve "the latest 30 older than 30 days". The `tail` functionality has to come first, *then* the "older" filter. – Jeff Y Dec 03 '15 at 16:17

complicated find on bash

3 Answers3