0

I need to print all my matched strings from a stored line in perl. I have seen various posts on this Print the matched string using perl Perl Regex - Print the matched value

and I experimented to first try to print the first word. But I get a build error

Use of uninitialized value $1 in concatenation (.) or string at rg.pl line 10.

I have tried with split and arrays and it works, but while printing $1, it throws error.

My code is here

#!/usr/bin/perl/
use warnings;
use strict;


#my $line = "At a far distance near the bar, was a parked car. Star were shining in the night. The boy in the car had scar and he was at war with his enemy. \n";

my $line = "At a far distance near the bar, was a parked car. \n";
if($line =~ /[a-z]ar/gi)
{       print "$1 \n";   }

$_ = $line;

I want my output for this code to be

far

and subsequently print all the words containing ar,

far
near
bar
parked 
car

I even tried changing my code, as below but that didnt work, same error

if($line =~ /[a-z]ar/gi)  {
        my $match = $1;
        print "$match \n"; }
Community
  • 1
  • 1

2 Answers2

3

First, you didn't capture anything, which is how $n variables are populated. Put parenthesis around what you want to be captured into $1

if ($line =~ /([a-z]ar)i/) { print "$1\n" }

I've removed the /g which is unneeded (and with potential for trouble) here.

Next, your pattern requires and captures one letter followed by literal ar, no more no less. That won't capture near, nor will it capture parked (it'll get par only). It will not even match a word that starts with ar, since it requires that there is a letter before ar. You need to use quantifiers, to tell it how many times to match a letter. And you also want to find all matches.

One way is to scoop them all up by providing the list context and /g (global) modifier

my @words = $line =~ /([a-z]*ar[a-z]*)/gi;

print "$_\n" for @words;

The [a-z]* means to match a letter, zero-or-more times. So an optional string of letters. We also added an optional string of letters after ar. The /g makes it continue through the string after a match, to find all such patterns. In the list context the list of matches is returned.

Or, you can match in scalar context like in the first example, but in a while loop

while ($line =~ /([a-z]*ar[a-z]*)/gi) { print "$1\n" }

Here /g does something different. It matches a pattern once and returns true, the while condition is true and we print. Then it comes back and looks for a match from where it matched previously ... and keeps doing this until there are no more matches.

This is complex behavior altogether. From Regexp Quote-Like Operators in perlop

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match.   [...]

Read about this in more detail and in a tutorial manner in perlretut, under "Global matching."


Note on using /g modifier in scalar context

I've used that above, in while (/.../g), what is a very common way to hop over all occurrences of the pattern in a string, each time giving us control in the while body.

While this use is intended and idiomatic, the use of /g in scalar context can bring subtle trouble when not in the loop condition: the next regex with /g on this variable will continue from the previous match, not from the string's beginning, what may be unexpected.

That "next regex" may also simply be that same expression -- in the next pass of some larger loop in which our expression happens to be, and this holds across function calls as well. Consider

use warnings;
use strict;
use feature 'say';

my $s = q(one two three);

sub func { say $1 if $_[0] =~ /(\w+)/g };  # /g may be of great consequence!

for (1..4) {
    # ... perhaps much, much later ...
    func($s);
}

This loop prints lines one, then two, then three, and that's that. This (working) example is so bare bones that it is artificial bit I hope that it conveys that /g in scalar context may surprise.

For one thing, it is not uncommon to see /g on a regex in an if condition being plain wrong.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • Can you please kindly explain a bit more, How are these two different in working? if ($line =~ /([a-z]ar)i/) { print "$1\n" } if($line =~ /[a-z]ar/gi) { print "$1 \n"; } – Shankhadeep Mukerji Feb 24 '17 at 03:11
  • One is a global match (`g`), the other is not. – Tim Biegeleisen Feb 24 '17 at 03:12
  • No, I mean the capturing part How are you doing it? – Shankhadeep Mukerji Feb 24 '17 at 03:13
  • @ShankhadeepMukerji Yes ... I thought I did in the answer. With `/[a-z]ar/` you match, so the `if` returns true. But the match isn't _captured_, so there is nothing in `$1`. The `$n` variables get filled with things that are _captured_ (by parenthesis). Is this what you are asking? – zdim Feb 24 '17 at 03:13
  • Yes, I did not know about parenthesis doing the capture. Thanks. – Shankhadeep Mukerji Feb 24 '17 at 03:15
  • So, how to capture parked and near? – Shankhadeep Mukerji Feb 24 '17 at 03:19
  • @ShankhadeepMukerji The second example gets all of them and puts them in `@words`. Is that OK? I am adding a capture-as-you-go way as well ... – zdim Feb 24 '17 at 03:20
  • 1
    @ShankhadeepMukerji I've added a more complete description/commentary. Will edit more as I review. – zdim Feb 24 '17 at 03:32
  • 1
    This is _massive_ overkill for such a simple question. – Tim Biegeleisen Feb 24 '17 at 03:32
  • 2
    @TimBiegeleisen Well, perhaps. I keep failing to provide _minimal_ answers. I mean, they ask about how to capture globally. It is probably the most used feature of regex, and it is complex. I feel like I'd leave them shorthanded, almost cheated, if i just gave the direct answer in one line of code. – zdim Feb 24 '17 at 03:42
  • @TimBiegeleisen Thank you for the response (and for the vote)! Of course, you raised a perfectly good point, of getting the best-fitting extent of the answer, and keeping it as concise as feasible. I just don't yet know how to get that measure right. – zdim Feb 24 '17 at 18:41
  • (... and I prefer to err on the side of giving too much rather than perhaps too little. Just came upon this exchange and would like to add: I still didn't figure out the "best" measure, and I still prefer to err on the "perhaps too much" side.) – zdim Dec 15 '21 at 20:17
2

For multiple matches, use a while loop. Also, I surrounded the quantity you want to capture with parentheses to indicate that it is a capture group.

while ($line =~ /([a-z]*ar[a-z]*)/gi ) {
    print "$1 \n";
}
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360