4

Let's say I have a subroutine/method that a user can call to test some data that (as an example) might look like this:

sub test_output {
    my ($self, $test) = @_;
    my $output = $self->long_process_to_get_data();
    if ($output =~ /\Q$test/) {
        $self->assert_something();
    }
    else {
        $self->do_something_else();
    }
}

Normally, $test is a string, which we're looking for anywhere in the output. This was an interface put together to make calling it very easy. However, we've found that sometimes, a straight string is problematic - for example, a large, possibly varying number of spaces...a pattern, if you will. Thus, I'd like to let them pass in a regex as an option. I could just do:

$output =~ $test

if I could assume that it's always a regex, but ah, but the backwards compatibility! If they pass in a string, it still needs to test it like a raw string.

So in that case, I'll need to test to see if $test is a regex. Is there any good facility for detecting whether or not a scalar has a compiled regex in it?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Robert P
  • 15,707
  • 10
  • 68
  • 112

3 Answers3

18

As hobbs points out, if you're sure that you'll be on 5.10 or later, you can use the built-in check:

 use 5.010;
 use re qw(is_regexp);
 if (is_regexp($pattern)) {
     say "It's a regex";
 } else {
     say "Not a regex";
 }

However, I don't always have that option. In general, I do this by checking against a prototype value with ref:

 if( ref $scalar eq ref qr// ) { ... }

One of the reasons I started doing it this way was that I could never remember the type name for a regex reference. I can't even remember it now. It's not uppercase like the rest of them, either, because it's really one of the packages implemented in the perl source code (in regcomp.c if you care to see it).

If you have to do that a lot, you can make that prototype value a constant using your favorite constant creator:

 use constant REGEX_TYPE => ref qr//;

I talk about this at length in Effective Perl Programming as "Item 59: Compare values to prototypes".

If you want to try it both ways, you can use a version check on perl:

 if( $] < 5.010 ) { warn "upgrade now!\n"; ... do it my way ... }
 else             { ... use is_regex ... }
Community
  • 1
  • 1
brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • 1
    Cute. Except for having to remember the messed up capitalization of `Regexp` (compared to all the other values `ref` returns), is there any other reason to prefer this over `ref $scalar eq 'Regexp'`? – Sinan Ünür Apr 01 '10 at 20:20
  • 3
    I hate magic constants and hard-coded strings and I try to get rid of them whenever I can. They are generally poor programming practice. – brian d foy Apr 01 '10 at 20:23
  • Anyone know why its not uc like the other types? – Eric Strom Apr 01 '10 at 20:26
  • 1
    It's not a core Perl data type. That is, there's no symbol table slot for a regex. You also can't have named regexes like you can with scalars, arrays, hashes, subroutines, filehandles, etc. – brian d foy Apr 01 '10 at 20:28
  • @brian - I like that approach...Do you also do ref [] and ref {}? or those are too basic to rate "magic constant" in your view? – DVK Apr 01 '10 at 20:30
  • The capitalization isn't that strange. Just imagine it was blessed in a package called `Regexp`. – mob Apr 01 '10 at 20:30
  • @brian - re "not a core data type" - do you know whether that will change in Perl6? Thanks! – DVK Apr 01 '10 at 20:31
  • @DVK: I use this for all reference type checking, even for hashes and arrays. – brian d foy Apr 01 '10 at 20:33
  • @brian - Is there any possible performance concern (if the ref[] is in tight loop)? Or it gets optimized to a constant by perl? – DVK Apr 01 '10 at 20:41
  • 3
    You don't have to ask me all these questions. Try it yourself. :) – brian d foy Apr 01 '10 at 20:57
  • Note that since Perl doesn't have an anonymous scalar constructor you'll have to create a named variable to use this pattern for that case. e.g. `ref \ do { my $x }`. *Don't* take a reference to an existing variable, as you'll get REF instead of SCALAR if it already holds a reference to something else. – Michael Carman Apr 01 '10 at 21:55
  • 1
    @Micheal: you don't need to take a reference to a scalar variable, just a reference to a scalar, like \'' – brian d foy Apr 01 '10 at 22:28
  • 3
    Regex objects actually get slightly more "core" in 5.12.0, as they're now references to scalars of type REGEXP rather than references to scalars with magic. This is, however, completely invisible to user code, unless you manage to bypass overloaded stringification, in which case you'll notice that regexes now print as `Regexp=REGEXP(0x1234567)` instead of `Regexp=SCALAR(0x1234567)` :) – hobbs Apr 02 '10 at 10:14
  • `*is_regex = $] >= 5.010 ? \&re::is_regex : sub { ... };` – ikegami Jan 13 '17 at 21:23
  • `if( ref $scalar eq ref qr// ) { ... }` doesn't work. 1) `perl -le'$re = ${ qr/abc/ }; print ref $scalar eq ref qr// ?1:0; print "abc" =~ $re;'` 2) `perl -le'$re = qr/abc/; bless($re, "Foo"); print ref $scalar eq ref qr// ?1:0; print "abc" =~ $re;'` – ikegami Jan 13 '17 at 21:26
10

As of perl 5.10.0 there's a direct, non-tricky way to do this:

use 5.010;
use re qw(is_regexp);
if (is_regexp($pattern)) {
    say "It's a regex";
} else {
    say "Not a regex";
}

is_regexp uses the same internal test that perl uses, which means that unlike ref, it won't be fooled if, for some strange reason, you decide to bless a regex object into a class other than Regexp (yes, that's possible).

In the future (or right now, if you can ship code with a 5.10.0 requirement) this should be considered the standard answer to the problem. Not only because it avoids a tricky edge-case, but also because it has the advantage of saying exactly what it means. Expressive code is a good thing.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
hobbs
  • 223,387
  • 19
  • 210
  • 288
3

See the ref built-in.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378