0

I am trying to parse FSM statements of the Gezel language (http://rijndael.ece.vt.edu/gezel2/) using Python and regular expressions

regex_cond = re.compile(r'.+((else\tif|else|if)).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2);

I have problems to distinguish if and else if. The else if in the example is recognized as a if.

mrks
  • 1,421
  • 2
  • 12
  • 20
  • 1
    Isn't \t a tab? So it's actually searching for elseif instead of elseif? I would try switching to else\sif. – martiert Aug 12 '10 at 14:43

4 Answers4

3

a \t matches a tab character. It doesn't look like you have a tab character between "else" and "if" in line2. You might try \s instead, which matches any whitespace character.

Alex B
  • 1,438
  • 13
  • 17
  • I might also suggest that you could remove the double parentheses ((...)) and replace with one set (...), as one set will provide both a capture and an alternate. – Alex B Aug 12 '10 at 14:53
  • True, but not the only problem. – Katriel Aug 12 '10 at 15:09
2

Don't do this; use pyparsing instead. You'll thank yourself later.


The problem is that .+ is greedy, so it's eating up the else... do .+? instead. Or rather, don't, because you're using pyparsing now.

regex_cond = re.compile( r'.+?(else\sif|else|if).+?' )
...
# else if
Katriel
  • 120,462
  • 19
  • 136
  • 170
1

Your immediate problem is that .+ is greedy and so it matches @s0 else instead of just @s0. To make it non-greedy, use .+? instead:

import re

regex_cond = re.compile(r'.+?(else\s+if|else|if).+')  
line2 = '@s0 else if (insreg==1) then (initx,PING,notend) -> sinitx;'
match = regex_cond.match(line2)
print(match.groups())
# ('else if',)

However, like others have suggested, using a parser like Pyparsing is a better method than using re here.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
0

Correct me if im wrong, but RE are not good for parsing, since its only sufficient for Type2 languages. For exaple you can't decide weather or not ((())())) is a valid statement without "counting", which regex can't do. Or, to talk about your example, if else else could not be found as invalid. Maybe im mixiung up scanner/parser, in this case please tell me.

InsertNickHere
  • 3,616
  • 3
  • 26
  • 23
  • Parsing nested structures with Regex was pretty well shot down in [this SO question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). This question related to HTML but applies equally well to any nested structures – NealB Aug 12 '10 at 14:56