7

Related to my earlier question about case-insensitive keyword matching using regular expressions.

Is it possible to match strings case-insensitively in Marpa? If so, how?

Suppose I have the grammar

:start ::= script
identifier ~ [\w]+
script ::= 'script' identifier code
code ::= command*
command ::= 'run' | 'walk' | 'stop'

How can I make it match any of script, Script, SCRIPT or any other combination of lower and uppercase letters?

Community
  • 1
  • 1
onitake
  • 1,369
  • 7
  • 14
  • Just 7-bit ASCII, the extended 8-bit ASCII encoding on your favourite OS, or full Unicode? – hippietrail Aug 25 '14 at 13:02
  • As the project in question was written in Perl, I suppose: Whatever Perl thinks the correct encoding for the data is. However, since the grammar is meant for a programming language, 7-bit ASCII would suffice for identifiers. – onitake Oct 11 '14 at 23:48
  • Perl has perhaps the most comprehensive support for different encodings of any programming language. I don't know however whether Marpa just uses Perl's regex directly or reimplements a limited subset. But in general in any language I've always used regexes like `[sS][cC][rR][iI][pP][tT]` when faced with this. – hippietrail Oct 12 '14 at 00:12
  • That's certainly possible, but makes any grammar much harder to read, particularly if there are a lot of identifiers. – onitake Oct 14 '14 at 10:18

1 Answers1

3

There isn't a straightforward way to specify case-insensitivity. Of course, you can string together character classes: [sS] [cC] [rR] [iI] [pP] [tT], but that's pretty awkward.

Sorry. Case-insensitive strings would be a good feature to add.

UPDATE: As of 2.076000, the latest indexed release, Marpa::R2 now has an :ic modifier for both strings and character classes, making them case-insensitve. In the docs, see https://metacpan.org/pod/Marpa::R2::Scanless::DSL#Single-quoted-strings and https://metacpan.org/pod/Marpa::R2::Scanless::DSL#Character-classes .

Jeffrey Kegler
  • 841
  • 1
  • 6
  • 8
  • I see. Is there some other way of influencing the G0 parser, like inserting code that converts strings to lower case before they are sent to comparison? Another solution I can think of is preparsing the input and converting everything that matches a keyword to lower case first. This could probably be done using ordinary `s///`. – onitake Jul 09 '13 at 13:42
  • 1
    It is possible to bypass the G0 parser and use your own scanner: https://metacpan.org/module/JKEGL/Marpa-R2-2.062000/pod/Scanless/R.pod#Internal-and-external-scanning. I'll have to add case-insensitivity to my priorities for new features. – Jeffrey Kegler Jul 09 '13 at 17:25
  • Thank you, I'll have a look. – onitake Jul 10 '13 at 13:49
  • Inside the square brackets the parens lose their specialness. – Jeffrey Kegler Jul 10 '13 at 22:22