2

I am converting a SMAPI grammar to JSGF. They are pretty similar grammars used in different speech recognition systems. SMAPI uses a question mark they way the rest of the world does, to mean 0 or 1 of the previous thing. JSGF uses square brackets for this. So, I need to convert a string like stuff? to [stuff], and parenthesized strings like ((((stuff)? that)? I)? like)? to [[[[stuff] that] I] like]. I have to leave alone strings like ((((stuff) that) I) hate). As Qtax pointed out, a more complicated example would be (foo ((bar)? (baz))?) being replaced by (foo [[bar] (baz)]).

Because of this, I have to extract every level of a parenthesized expression, see if it ends in a question mark, and replace the parens and question mark with square braces if it does. I think Eric Strom's answer to this question is almost what I need. The problem is that when I use it, it returns the largest matched grouping, whereas I need to do operations on each individual groupings.

This is what I have so far: s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg. When matched with ((((stuff)? that)? I)? like)?, however, it produces only [((((stuff)? that)? I)? like)]. Any ideas on how to do this?

I

Community
  • 1
  • 1
Nate Glenn
  • 6,455
  • 8
  • 52
  • 95

3 Answers3

4

You'll also want to look at ysth's solution to that question, and use a tool that is already available to solve this problem:

use Text::Balanced qw(extract_bracketed);
$text = '((((stuff)? that)? I)? like)?';

for ($i=0; $i<length($text); $i++) {
    ($match,$remainder) = extract_bracketed( substr($text,$i), '()' );
    if ($match && $remainder =~ /^\?/) {
        substr($text,$i) =
            '[' . substr($match,1,-1) . ']' . substr($remainder,1);
        $i=-1; # fixed
    }
}
Community
  • 1
  • 1
mob
  • 117,087
  • 18
  • 149
  • 283
  • @Ωmega, nice how? This version doesn't work at all even with the OPs original example. (Result is `[[([stuff] that)? I] like]`.) For that simple example, the first version worked, but it's no good since it fails with a proper example, like `(foo ((bar)? (baz))?)`, result `(foo[([bar] (baz]?)`. Doesn't work for anything, -1 till fixed. – Qtax Jun 28 '12 at 00:38
2

In older Perl versions (pre 5.10), one could have used code assertions and dynamic regex for this:

 ...
 my $s = '((((stuff)? that)? I)? like)?';

 # recursive dynamic regex, we need
 # to pre-declare lexical variables
 my $rg;

 # use a dynamically generated regex (??{..})
 # and a code assertion (?{..})
 $rg = qr{
          (?:                       # start expression
           (?> [^)(]+)              # (a) we don't see any (..) => atomic!
            |                       # OR 
           (                        # (b) start capturing group for level
            \( (??{$rg}) \) \?      # oops, we found parentheses \(,\) w/sth 
           )                        # in between and the \? at the end
           (?{ print "[ $^N ]\n" }) # if we got here, print the captured text $^N
          )*                        # done, repeat expression if possible
         }xs;

 $s =~ /$rg/;
 ...

during the match, the code assertion prints all matches, which are:

 [ (stuff)? ]
 [ ((stuff)? that)? ]
 [ (((stuff)? that)? I)? ]
 [ ((((stuff)? that)? I)? like)? ]

To use this according to your requirements, you could change the code assertion slightly, put the capturing parentheses at the right place, and save the matches in an array:

 ...
 my @result;
 my $rg;
 $rg = qr{
          (?:                      
           (?> [^)(]+)             
            |                      
            \( ( (??{$rg}) ) \) \?  (?{ push @result, $^N })
          )*                     
         }xs;

 $s =~ /$rg/ && print map "[$_]\n", @result;
 ...

which says:

 [stuff]
 [(stuff)? that]
 [((stuff)? that)? I]
 [(((stuff)? that)? I)? like]

Regards

rbo

rubber boots
  • 14,924
  • 5
  • 33
  • 44
1

You could solve it in a couple of ways, simplest being just executing your expression till there are no more replacements made. E.g:

1 while s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg;

But that is highly inefficient (for deeply nested strings).

You could do it in one pass like this instead:

s{
  (?(DEFINE)
    (?<r>   \( (?: [^()]++ | (?&r) )*+ \)   )
  )

  ( \( )
  (?=   (?: [^()]++ | (?&r) )*+ \) \?   )

  |

  \) \?
}{
  $2? '[': ']'
}gex;
Qtax
  • 33,241
  • 9
  • 83
  • 121
  • The 1 while thing works great! How come giving the regex the g swith didn't do the same thing? However, given ` = ((((stuff)? that)? I)? like)?`, your regex gives me ` = ]]]]stuff] that] I] like]` for some reason. – Nate Glenn Jun 25 '12 at 17:22
  • @NateGlenn, works now, changed `$1` to `$2`. (Forgot to count the recursive group, doh.) – Qtax Jun 25 '12 at 17:32
  • Though some comments on it would be nice... It's far beyond my regex abilities. – Nate Glenn Jun 25 '12 at 17:47
  • @NateGlenn - This will not work if there is `?` in text, like for example with text `((((stu?ff)? that)? I)? like)?` – Ωmega Jun 25 '12 at 17:56
  • @user1215106: If it were true, how likely are you to have a ? in a speech recognition grammar anyway? Recognition grammars generally have no punctuation, outside of dashes and apostrophes, which are legitimate within a word. – Nate Glenn Jun 25 '12 at 18:01