0

I am using the following Regular Expression pattern to match an if statement written in C# style;

\b[if]{2}\b[ ]?\({1}(?<HeaderSection>[ \w\s\a\t\=\.\@\#\$\%\&a-zA-Z0-9\(\)\;\/\"\'\[\]\*]*)\){1}(?<CommentSection>[\s\a\w\t a-zA-Z0-9\/\.]*)[\r\n]*\{{1}(?<FunctionBody>[\r\n \a\s\wa-zA-Z0-9\(\)\"\.\;\:]*)[\r\n]*\}{1}

Its a crazy long regex pattern but seems to be working to some extent.Let me explain it,it has three named capturing Groups namely HeaderSection, CommentSection and FunctionBody.HeaderSection captures match between starting and closing parentheses of if statement,such as from the statement below;

if(Value1==Function(int Z))

it captures ;

Value1==Function(int Z)

Similarly CommentSection captures comment(if any) after the closing parentheses,so from the statement below ;

if(Value1==Function(int Z))//This is a Comment.

it captures

//This is a Comment.

and FunctionBody captures anything between { and },such as in the code below ;

if(Value1==Function(int Z))//This is a Comment.
{
  This is the
  space for
  function body.
}

it captures "This is the space for function body." So that was explanation of what the regex matches.Now the issue with it is that if i have some function like this;

if(Value1==Function(int Z)//This is a Comment.
{
  if(Value2==Value1)
  {
    Some code
  }
}

and if i match it using the regex above it doesn't match the first if declaration i.e;

if(Value1==Function(int Z)//This is a Comment.
{
Another function();
}

and instead matches the inner one i.e

  if(Value2==Value1)
  {
    Some code
  }

Please point what i have done wrong,or if there is another way that is less messy please let me know,or correct the regex pattern if its wrong somewhere.One more thing i'm doing all this in C# using Regular Expression functions. Thanks in advance.

devavx
  • 1,035
  • 9
  • 22
  • 8
    I don't have enough fingers to point at whats wrong here. Why do you want to do this in regex? At all? – Firas Dib Aug 29 '13 at 22:19
  • No,i don't want to use regex but i cannot find other way of doing it,so i'm forced to use them. – devavx Aug 29 '13 at 22:23
  • Good old days http://dinosaur.compilertools.net/ – I4V Aug 29 '13 at 22:27
  • 3
    Ditch the regex and use a C# based C parser. See this for hints http://stackoverflow.com/questions/5992068/c-parser-in-c-sharp-or-generally-net – Mark Lakata Aug 29 '13 at 22:49
  • 2
    Since if and it open/close tags can be nested you can check all-time-favorite question about parsing XHTML with RegEx http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags (which faces similar issues)... seriously. – Alexei Levenkov Aug 29 '13 at 22:59
  • 1
    @MarkLakata Good advice, though lexical analyzers DO use regular expressions. They just return them as tokens instead of plain strings. – Arian Motamedi Aug 29 '13 at 22:59
  • 1
    Using [Irony](http://irony.codeplex.com) might be a good choice here. As others have suggested, regular expressions aren't really well suited to the task, but it's an relatively easy one to accomplish with a proper parser. – Sean U Aug 30 '13 at 03:25

1 Answers1

1
(?<header>if\(.*?)(?<comment>//.*?)*\s\n\{(?<functionbody>.*?)\n\}

this seems to be a solution if the paran is formated in the supposed way.

(?<header>if\(.*?)

will match if( followed by anything BUT before the // section, so it will match

if(Value1==Function(int Z))

then it moves on to the (?<comment>//.*?)*\s that will match anything following the // signs BUT will also match if there is nothing *equals zero or more occurences, and the \s makes sure that it doesnt go beyond the line end.

then (\n\{)(?<functionbody>.*?)(\n\}) matches any { just after a newline and progresses until a } is found just after a newline.

in

var x = 0
if(Value1==Function(int Z))//This is a Comment.
{
  if(Value2==Value1)
  {
    Some code
  }
}
var y = 0

if(y == x) 
{
    x = y + 1
}

it will match the following groups :

header: if(Value1==Function(int Z))
comment: //This is a Comment.
functionbody: 
  if(Value2==Value1)
  {
    Some code
  }

header: if(y == x) 
functionbody: 
        x = y + 1
Sedecimdies
  • 152
  • 1
  • 10