73

Is it possible to store all matches for a regular expression into an array?

I know I can use ($1,...,$n) = m/expr/g;, but it seems as though that can only be used if you know the number of matches you are looking for. I have tried my @array = m/expr/g;, but that doesn't seem to work.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cskwrd
  • 2,803
  • 8
  • 38
  • 51
  • 10
    explain "doesn't seem to work", preferably with an actual example. that *should* work. – ysth Feb 21 '10 at 03:10
  • Using `($1, ...) = ...` is a very bad example that should be edited. It's confusing as a capture group will set `$1` already. So why would you reassign (it that's allowed at all)? – U. Windl Aug 01 '19 at 07:12

7 Answers7

87

If you're doing a global match (/g) then the regex in list context will return all of the captured matches. Simply do:

my @matches = ( $str =~ /pa(tt)ern/g )

This command for example:

perl -le '@m = ( "foo12gfd2bgbg654" =~ /(\d+)/g ); print for @m'

Gives the output:

12
2
654
Sicco
  • 6,167
  • 5
  • 45
  • 61
friedo
  • 65,762
  • 16
  • 114
  • 184
  • 3
    Be sure to use " if you try this in windows 'shell', like this, perl -le "@m = ( 'foo12gfd2bgbg654' =~ /(\d+)/g ); print for @m" otherwise you get an error, since the shell uses " as string delimiter – roamcel Feb 09 '12 at 16:22
  • Unfortunately it doesn't work for substitute like in the slightly modified example: `perl -le '@m = ( (my $s = "foo12gfd2bgbg654") =~ s/(\d+)//g ); print for @m'` just prints `3`. – U. Windl Aug 01 '19 at 07:16
  • 1
    @U.Windl It's not "slightly modified". You are getting the return value of `s/.../.../g`, which returns the number of substitutions made. The only place where you can use the captured value in substitutions, is in the substitution value. For example: `s/(\d+)/Found number $1/g`. – Francisco Zarabozo Jun 15 '20 at 19:02
19

Sometimes you need to get all matches globally, like PHP's preg_match_all does. If it's your case, then you can write something like:

# a dummy example
my $subject = 'Philip Fry Bender Rodriguez Turanga Leela';
my @matches;
push @matches, [$1, $2] while $subject =~ /(\w+) (\w+)/g;

use Data::Dumper;
print Dumper(\@matches);

It prints

$VAR1 = [
          [
            'Philip',
            'Fry'
          ],
          [
            'Bender',
            'Rodriguez'
          ],
          [
            'Turanga',
            'Leela'
          ]
        ];
codeholic
  • 5,680
  • 3
  • 23
  • 43
  • 3
    Very handy technique; is there a way to generalize this, in case the number of capture groups is not known? Looks like you might need a special array variable that comprises `( $1, $2, ...)`, but I couldn't find such a thing. – mklement0 Jul 17 '15 at 15:58
  • 2
    @mklement0 Yes, in Perl 5.25.7, the variable `@{^CAPTURE}` was added. It contains `($1, $2, ...)` from the last successful match. To generalize the answer above, do `push @matches, [@{^CAPTURE}] while $subject =~ /(\w+) (\w+)/g;` – Viktor Söderqvist Nov 02 '20 at 14:08
18

See the manual entry for perldoc perlop under "Matching in List Context":

If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1 , $2 , $3 ...)

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

You can simply grab all the matches by assigning to an array, or otherwise performing the evaluation in list context:

my @matches = ($string =~ m/word/g);
Ether
  • 53,118
  • 13
  • 86
  • 159
10

I think this is a self-explanatory example. Note /g modifier in the first regex:

$string = "one two three four";

@res = $string =~ m/(\w+)/g;
print Dumper(@res); # @res = ("one", "two", "three", "four")

@res = $string =~ m/(\w+) (\w+)/;
print Dumper(@res); # @res = ("one", "two")

Remember, you need to make sure the lvalue is in the list context, which means you have to surround scalar values with parenthesis:

($one, $two) = $string =~ m/(\w+) (\w+)/;
Flimm
  • 136,138
  • 45
  • 251
  • 267
4

Is it possible to store all matches for a regular expression into an array?

Yes, in Perl 5.25.7, the variable @{^CAPTURE} was added, which holds "the contents of the capture buffers, if any, of the last successful pattern match". This means it contains ($1, $2, ...) even if the number of capture groups is unknown.

Before Perl 5.25.7 (since 5.6.0) you could build the same array using @- and @+ as suggested by @Jaques in his answer. You would have to do something like this:

    my @capture = ();
    for (my $i = 1; $i < @+; $i++) {
        push @capture, substr $subject, $-[$i], $+[$i] - $-[$i];
    }
3

I am surprised this is not already mentioned here, but perl documentation provides with the standard variable @+. To quote from the documentation:

This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope.

So, to get the value caught in first capture, one would write:

print substr( $str, $-[1], $+[1] - $-[1] ), "\n"; # equivalent to $1

As a side note, there is also the standard variable %- which is very nifty, because it not only contains named captures, but also allows for duplicate names to be stored in an array.

Using the example provided in the documentation:

/(?<A>1)(?<B>2)(?<A>3)(?<B>4)/

would yield an hash with entries such as:

$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'
Jacques
  • 991
  • 1
  • 12
  • 15
0

Note that if you know the number of capturing groups you need per match, you can use this simple approach, which I present as an example (of 2 capturing groups.)

Suppose you have some 'data' like

my $mess = <<'IS_YOURS';
Richard     Rich
April           May
Harmony             Ha\rm
Winter           Win
Faith     Hope
William         Will
Aurora     Dawn
Joy  
IS_YOURS

With the following regex

my $oven = qr'^(\w+)\h+(\w+)$'ma;  # skip the /a modifier if using perl < 5.14

I can capture all 12 (6 pairs, not 8...Harmony escaped and Joy is missing) in the @box below.

my @box = $mess =~ m[$oven]g;

If I want to "hash out" the details of the box I could just do:

my %hash = @box;

Or I just could have just skipped the box entirely,

my %hash = $mess =~ m[$oven]g;

Note that %hash contains the following. Order is lost and dupe keys (if any had existed) are squashed:

(
          'April'   => 'May',
          'Richard' => 'Rich',
          'Winter'  => 'Win',
          'William' => 'Will', 
          'Faith'   => 'Hope',
          'Aurora'  => 'Dawn'
);
YenForYang
  • 2,998
  • 25
  • 22
  • What this answer failed to consider is the *unknown* number of capture groups: You just work with two. So you could always use `($1, $2)` to get all matches. – U. Windl Aug 01 '19 at 07:23