12

I want to find sequences matching my regexp should they be in the middle of the string surrounded by spaces, in the end or beginning or be the only thing in a string.

Example: Let's assume the sequence 'qwe45rty' is what we are looking for. I want to be able to get positive on all of these strings:

'qwe45rty' 'qwe45rty blabla' 'smth qwe45rty blabla' 'smth qwe45rty' ' qwe45rty '

But none of these:

'aaqwe45rty' 'qwe45rtybb' 'aaqwe45rtybb'

Best what I came up with is smth like this:

if ( ($a =~ /\s+$re\s+/) or
     ($a =~ /^$re\s+/)   or
     ($a =~ /\s+$re$/)   or
     ($a =~ /^$re$/)        )
{
    # do stuff
}

which can't be the best way to do that :)

Any suggestions?

bazzilic
  • 826
  • 2
  • 8
  • 22

3 Answers3

27

You can do the or inside the regex:

/(^|\s+)qwe45rty(?=\s+|$)/

regex101

Note that the second group is a positive lookahead (?=) so it checks for whitespace, but doesn't consume it. That way the regex can match two consecutive occurrences of the string and give an accurate match count.

AndreKR
  • 32,613
  • 18
  • 106
  • 168
  • 1
    I know about `or` inside the regex, but I thought string anchors `^` and `$` can't be used that way. – bazzilic Nov 05 '12 at 04:26
  • 2
    @bazzilic I've used anchors in that way without any problems. The only problem is that if you're using parentheses for matching purposes, the parentheses will add to your count. You can use `(?:^|\s+)` to get around this issue. – David W. Nov 05 '12 at 04:40
  • I picked your solution, thanks for pointing out to me that string anchors could be used as any other control sequences in Perl regexes! – bazzilic Nov 06 '12 at 05:36
  • Important note: this will only find ONE match in `"qwe45rty qwe45rty"` – Alex from Jitbit Feb 25 '21 at 17:26
  • @Alex Well, that depends less on the regex and more on the way you call it, doesn't it? In Javascript you would use the `g` modifier, in PHP you would use `preg_match_all()`, in Go you would use `FindAll()`. For Perl I don't know, so maybe I shouldn't have answered this question in the first place. :P – AndreKR Feb 25 '21 at 17:36
  • @AndreKR nope, it will match only one even with a `/g` because the regex will match every OTHER occurence. In a string `"qwe45rty qwe45rty qwe45rty"` it will find TWO matches. Check this: https://i.imgur.com/Dv0GQZD.png – Alex from Jitbit Feb 25 '21 at 17:41
  • @Alex Oooh, indeed, because the first match consumes the whitespace that would be required for the second match. I'm gonna change that to lookahead, just a sec. – AndreKR Feb 25 '21 at 17:54
  • @AndreKR yep, exactly, this is one of those "overlapping matches" issues. It can be solved by lookaheads/lookbehinds. No need no edit the answer if you have no time, just wanted to warn everyone else. – Alex from Jitbit Feb 25 '21 at 17:57
8

Try coming at the problem from a different direction. To say something can match whitespace or nothing is to say it can't match a non-whitespace character:

(?<!\S)qwe45rty(?!\S)

Just a little shift in perspective and the regex practically writes itself.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

Try the following:

$a =~ /(?:\A|\s)$re(?:\s|\Z)/;

For example:

use strict;
use warnings;

my $re = 'qwe45rty';
while (<DATA>) {
    chomp;
    print "'$_': Match? " . ( /(?:\A|\s)$re(?:\s|\Z)/ ? 'Yes' : 'No' ) . "\n";
}

__DATA__
qwe45rty
qwe45rty blabla
smth qwe45rty blabla
smth qwe45rty
 qwe45rty 
aaqwe45rty
qwe45rtybb
aaqwe45rtybb

Output:

'qwe45rty': Match? Yes
'qwe45rty blabla': Match? Yes
'smth qwe45rty blabla': Match? Yes
'smth qwe45rty': Match? Yes
' qwe45rty ': Match? Yes
'aaqwe45rty': Match? No
'qwe45rtybb': Match? No
'aaqwe45rtybb': Match? No
Kenosis
  • 6,196
  • 1
  • 16
  • 16
  • 1
    Word boundaries are not always the solution. What if `$re` is `[a-zA-Z0-9!~]`? – bazzilic Nov 05 '12 at 04:29
  • 1
    @bazzilic - Excellent catch! You're correct that my original `$a =~ /\bqwe45rty\b;` would fail with your character set. Thank you for bringing this to my attention. The revised regex is more robust. – Kenosis Nov 05 '12 at 04:50
  • This is in fact what @AndreKR suggested, but thanks for `(?:...)` — I was unfamiliar with this before. Useful! – bazzilic Nov 06 '12 at 05:32
  • @bazzilic - Yes, noticed that after my posting. The (?: ... ) is, indeed, useful. I especially appreciated Alan Moore's elegant solution. – Kenosis Nov 06 '12 at 05:35