0

I wrote some regular expression code attempting to match nested begin-end code block, like in the pascal language. Example:

   begin
     begin
       stuff1
     end
   end

   begin
       stuff2
   end

when i run by code below i want to match:

   begin
     begin
       stuff1
     end
   end

with out the last "begin stuff2 end" pair...

Here's my attempt using windows powershell as the scripting language:

### POWERSHELL
$ErrorActionPreference = 'Stop';

# Want to match balanced begin-end block include nested begin-end:  
#     begin begin stuff1 end end

$t1 = ""
$t1 += "  begin"
$t1 += "     begin"
$ti += "      stuff1"
$t1 += "     end"
$t1 += "  end"
$t1 += ""
$t1 += "  begin"
$ti += "      stuff2"
$t1 += "  end"

write-host "LINE: $t1"

$rxop = [Text.RegularExpressions.RegexOptions]::IgnorePatternWhitespace -bor
        [Text.RegularExpressions.RegexOptions]::IgnoreCase

$rx = [Regex]::new(
       "^(
        \bbegin\b             
        (?:  
            (?<openp> \bbegin\b )
            |                    
            (?<-openp> \bend\b )
            |
            [^\t]+   
        )+ 

        (?(openp)(?!))

       \bend\b)
   ", 
   $rxop
)

$match = $rx.Match($t1)

if ($match.Success) {
    $name  = $match.Groups[1].Value
    write-host "matched: $name"
}
else {
    write-host "no-match"
}

basically it doesn't work.

Bimo
  • 5,987
  • 2
  • 39
  • 61
  • Regex is not designed to handle recursive structures in one chunk. Use the proper tool to handle recursive parsing. – jdweng Nov 08 '22 at 04:16
  • Please [Edit](https://stackoverflow.com/posts/74354341/edit) your question and correct the [mcve] as you mixing `$t1` and `$ti`, besides it isn't clear if `$t1` is supposed to contain a single line (as it is set up now) or multiple lines (start with `$t1 = @()`). As an aside, note: [Why should I avoid using the increase assignment operator (+=) to create a collection](https://stackoverflow.com/q/60708578/1701026) – iRon Nov 08 '22 at 09:27

1 Answers1

0

As commented by @jdweng:

Regex is not designed to handle recursive structures in one chunk. Use the proper tool to handle recursive parsing.

But luckily you have PowerShell:

$t1 = @'
begin
  begin
    stuff1
  end
end

begin
  stuff2
end
'@ -Split '\r?\n'

Using this SelectString prototype, you might create a recursive function, or simply invoke it a few times depending on the depth:

$t1 |SelectString -From 'begin' -To 'end' |SelectString -From 'begin' -To 'end'

    stuff1

  stuff2
iRon
  • 20,463
  • 10
  • 53
  • 79
  • lol... actually, i'm using perl because i didn't have redhat handy when writing this question.. so i used powershell instead... 'though, i would gladly write powershell on redhat if the admins installed it... – Bimo Nov 08 '22 at 13:17