How can I use perl to delete files matching a regex

Question

Due to a Makefile mistake, I have some fake files in my git repo...

$ ls
=0.1.1                  =4.8.0                  LICENSE
=0.5.3                  =5.2.0                  Makefile
=0.6.1                  =7.1.0                  pyproject.toml
=0.6.1,                 all_commands.txt        README_git_workflow.md
=0.8.1                  CHANGES.md              README.md
=1.2.0                  ciscoconfparse/         requirements.txt
=1.7.0                  configs/                sphinx-doc/
=2.0                    CONTRIBUTING.md         tests/
=2.2.0                  deploy_docs.py          tutorial/
=22.2.0                 dev_tools/              utils/
=22.8.0                 do.py
=2.7.0                  examples/
$

I tried this, but it seems that there may be some more efficient means to accomplish this task...

# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
    my $cmd1 = "rm $filename";
    `$cmd1`;
}

Question:

I want a remove command that matches against a pcre.
What is a more efficient perl solution to delete the files matching this perl regex: /\W\d+\.\d+/ (example filename: '=0.1.1')?

Is that `=` by any chance an indicator of the file-type in your `ls`, which may have been aliased with `-F`? (Ir is it really a literal character?) I added this to my answer — zdim, Oct 06 '22 at 16:51
You could try any of `rm =*`, `rm =[0,1,2,22,33,4,5,7]*`, `find . -type f -name '=*' -exec rm {} \;` in shell. — Polar Bear, Oct 06 '22 at 17:45
Please see [Linux / Unix: Find And Remove Files With One Command On Fly](https://www.cyberciti.biz/faq/linux-unix-how-to-find-and-remove-files/) — Polar Bear, Oct 06 '22 at 17:47
@PolarBear if we’re going to suggest non-perl alternative solutions, `fd` is the best I’ve found…. https://github.com/sharkdp/fd — Mike Pennington, Oct 07 '22 at 00:04
Is `fd` a part of Linux/Unix OS? Or you have to compile it from source code? — Polar Bear, Oct 07 '22 at 07:33
@PolarBear, you don't have to compile `fd`; it's available in several [linux distros via package manager](https://github.com/sharkdp/fd#installation), you can use one of the [official github release binaries](https://github.com/sharkdp/fd/releases), or use [`rustc / cargo`](https://doc.rust-lang.org/rustc/what-is-rustc.html) to compile `fd` from source. — Mike Pennington, Oct 07 '22 at 09:56

zdim · Accepted Answer · 2022-10-10T21:16:42.073

Fetch a wider set of files and then filter through whatever you want

my @files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*";

I added an anchor (^) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.

Altogether perhaps (or see a one-liner below ^†)

use warnings;
use strict;
use feature 'say';

use File::Glob ':bsd_glob';  # for better glob()
use Cwd qw(cwd);             # current-working-directory

my $dir = shift // cwd;      # cwd by default, or from input 

my $re = qr/^\W[0-9]+\.[0-9]+/;  

my @files_to_del = grep { /$re/ and not -d } glob "$dir/*"; 

say for @files_to_del;  # please inspect first

#unlink or warn "Can't unlink $_: $!" for @files_to_del;

where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote)^‡ then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.

I exclude directories, identified by -d filetest, since we are looking for files (and to not mix with some scary language about directories from unlink, thanks to brian d foy comment).

If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule, or yet others).

Or read the directory any other way (opendir+readdir, libraries like Path::Tiny), and filter.

^† Or, a quick one-liner ... print (to inspect) what's about to get blown away

perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'

and then delete 'em

perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'

(I switched to a more compact syntax just so. Not necessary)

If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do

perl -wE'$d = shift//q(.); ...'  dirpath (relative path fine. optional)

and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from @ARGV, if anything was passed to the script on the command line, or if @ARGV is empty it returns undef and then // (defined-or) operator picks up the string q(.).

^‡ That leading = may be an "indicator" of a file type if ls has been aliased with ls -F, what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls).

If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.

Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like

my $re = qr/^\W? [0-9]+ \. [0-9]+/x;

my @files_to_del = grep { /$re/ and -S } glob "$dir/*";

Your regex is using `[0-9]+` to match integers... why not `\d+`? — Mike Pennington, Oct 07 '22 at 10:06
@MikePennington The `\d`, as convenient as it is, matches all of Unicode "numbers." It's I believe some 700-800 characters, "digits" of some sort or another in all kinds of writing systems. I just prefer it to be precise, `[0-9]` matches these 10 which we mean. — zdim, Oct 07 '22 at 15:57
@MikePennington Forgot to mention, one can affect that with the `/a` modifier, so that `\d` matches only ASCII -- but then so do `\s` and `\w`, and posix character classes. See [`/a` and `/aa` in perlre](https://perldoc.perl.org/perlre#/a-(and-/aa)). But then it all gets only more complicated, I think. (If that _is_ an option, and one might need to adjust a bunch of regex, default modifiers for a scope can be set using [re pragma](https://perldoc.perl.org/re), like `use re '/a';`) — zdim, Oct 07 '22 at 17:30
Note that you probably want to filter out directories before unlink. The docs have some scary language about it. The problem is likely rare and improbable, but I don't rely on that. — brian d foy, Oct 09 '22 at 13:41
@briandfoy Good point, thank you, better not to try to do a wrong thing. (Also, I specifically build a list of "files.") Fixed and added a comment on directories. — zdim, Oct 09 '22 at 17:39
@MikePennington Updated with a minor change but a clear improvement -- the default directory to look into should of course be the current-working-directory (I don't know why I had the script's directory as default originally?). So that one can put it in some sort of a `~/bin` and use as a utility from anywhere. (This need looks more like a one-off, so the one-liner may be a better fit, but still, in principle.) Thank you for all that attribution! — zdim, Oct 10 '22 at 21:02

score 3 · Answer 2 · answered Oct 06 '22 at 13:26

3

Why not just:

$ rm =*

Sometimes, shell commands are the best option.

answered Oct 06 '22 at 13:26

Dave Cross

68,119
3
51
97

brian d foy · Answer 3 · 2022-10-07T18:08:04.507

In these cases, I use perl to merely filter the list of files:

ls | perl -ne 'print if /\A\W\d+\.\d+/a' | xargs rm

And, when I do that, I feel guilty for not doing something simpler with an extended pattern in grep:

ls | grep -E '^\W\d+\.\d+' | xargs rm

Eventually I'll run into a problem where there's a directory so I need to be more careful about the file list:

find . -type f  -maxdepth 1 | grep -E '^\./\W\d+\.\d+' | xargs rm

Or I need to allow rm to remove directories too should I want that:

ls | grep -E '^\W\d+\.\d+' | xargs rm -r

Andy Lester · Answer 4 · 2022-10-09T20:36:09.537

1

Here you go.

unlink( grep { /\W\d+\.\d+/ && !-d } glob( "*" ) );

This matches the filename, and excludes directories.

edited Oct 09 '22 at 20:36

answered Oct 07 '22 at 20:26

Andy Lester

91,102
13
100
152

1

"_here you go_" -- agree, and I even posted that a day ago in my answer on this page (just fyi) – zdim Oct 08 '22 at 19:49
1

You should probably filter out directories from the list. The unlink docs has some notes about that. – brian d foy Oct 09 '22 at 13:39

Mike Pennington · Answer 5 · 2022-10-10T12:49:49.060

0

To delete filenames matching this: /\W\d+\.\d+/ pcre, use the following one-liners...

1> $fn is a filename... I'm also removing the my keywords since the one-liner doesn't have to worry about perl lexical scopes:

perl -e 'foreach $fn (grep { /\W\d+\.\d+/ } glob "*") {$cmd1="rm $fn";`$cmd1`;}'

2> Or as Andy Lester responded, perhaps his answer is as efficient as we can make it...

perl -e 'unlink(grep { /\W\d+\.\d+/ } glob "*");'

edited Oct 10 '22 at 12:49

answered Oct 06 '22 at 12:24

Mike Pennington

41,899
19
136
174

8

See also [readdir](https://perldoc.perl.org/functions/readdir), [glob](https://perldoc.perl.org/functions/glob), and [unlink](https://perldoc.perl.org/functions/unlink) – Håkon Hægland Oct 06 '22 at 12:30
1

Don't loop it, and use `unlink`. See my answer below. – Andy Lester Oct 07 '22 at 20:25
@zdim: Well, the first bit of code has a code injection vulnerability. Not only that, since the regex is unanchored, it potentially matches more files than the original problem intended (although the question has changed significantly through this ordeal). That's why Hakon's comment is so highly upvoted. – brian d foy Oct 09 '22 at 17:49
@briandfoy OK, good points -- but people normally don't get downvoted for (the common) injection bug. Nor for the unanchored reegex, which is just what the question asks (and I copied it as well, will amend -- thanks for the reminder!). Plus there were two -1, I see one got withdrawn. My point was, I think it's about other things and I always get itchy when content gets valued for talk outside of it – zdim Oct 09 '22 at 18:44

How can I use perl to delete files matching a regex

Question:

5 Answers5