5

In the perl module Regexp::Grammars, consider the following token:

<token: command>       <%commands>

This token is part of a complex grammar, parsing a wide variety of different sentences.

This token matches any word in the hash %commands, which I have defined as follows (of course, outside any function):

our %commands = (
    'Basic_import'  => 1,
    'Wait'          => 1,
    'Reload'        => 1,
    'Log'           => 1,
); 

This works, for matching keywords like "Basic_import", "Wait", etc. However, I also want it to match on words like "basic_import", "wait", etc.

How do I make this hash case insensitive without having to copy and paste every keyword multiple times? Because this is part of a complex grammar, I want to use Regexp::Grammars, and I'd prefer not to have to revert to a grep for this particular exception.

psgels
  • 737
  • 1
  • 6
  • 19

3 Answers3

5

From the documentation, it sounds like <%commands> would match Wait of Waiting, so even a case-insensitive version of <%commands> would be less than ideal.

You normally want to match a generic identifier, and independently check if the identifier is a valid command. This is what prevents printfoo(); from being equivalent to print foo(); in Perl.

May I suggest the following:

use feature qw( fc );

our %commands = map { fc($_) => 1 } qw(
   Basic_import
   Wait
   Reload
   Log
); 

<rule: command> (<ident>) <require: (?{ $commands{fc($CAPTURE)} })>

<token: ident> \w+

You can probably get away with using lc instead of fc if you want backwards compatibility with version of Perl older than 5.16.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • "From the documentation, it sounds like..." Confirmed. I've never used Regexp::Grammars but it seems to have a weird idea of what a token is. You can also get around this by adding anchors to the pattern: `my $grammar = qr{ \A \z <%commands> };` – ThisSuitIsBlackNot May 09 '16 at 15:34
  • 1
    @ThisSuitIsBlackNot, You don't want to have to enforce the boundary everywhere `` is used. The check should be inside the token. Instead, use ` \b <%commands> \b` (though that assumes that all the keys of `%commands` start and end with a `\w` char.) – ikegami May 09 '16 at 15:41
2

You can use Hash::Case::Preserve to make hash lookups case insensitive:

use strict;
use warnings 'all';

use Data::Dump;
use Hash::Case::Preserve;
use Regexp::Grammars;

tie my %commands, 'Hash::Case::Preserve';

%commands = (
    'Basic_import'  => 1,
    'Wait'          => 1,
    'Reload'        => 1,
    'Log'           => 1,
);

my $grammar = qr{

    <command>

    <token: command>    <%commands>

};  

dd \%/ if 'basic_import' =~ $grammar;

Output:

{ "" => "basic_import", "command" => "basic_import" }

Note that you have to tie the hash before inserting any values into it.

ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110
  • Yes, this one works. I must have mistyped something when I tried it the first time. Thanks! – psgels May 09 '16 at 15:25
  • 1
    Tip: ` \b <%commands> \b` would be better, as it addresses the `printfoo();` vs `print foo();` problem I mentioned in my answer. (You'll need to adjust if the keys of `%commands` can start or end with something other than a `\w` char). – ikegami May 09 '16 at 15:44
0
%commands = map { lc($_) => 1, $_ => 1 } qw(
    Basic_import
    Wait
    Reload
    Log
);
druid62
  • 109
  • 3