0

I am writing a C++ style checker in Perl. But I am having a difficult time in constructing regular expressions for basic C++ constructs. For example if loop can have following form:

if( expression ) { or if ( expression ) ;

What I want is if the code does not following following guidelines then throw an error if<space>(expression)<space>{

Now that expression can be multi-line separated by logical operators, How Do I construct regular expression for the same?

Flexo
  • 87,323
  • 22
  • 191
  • 272
Avinash
  • 12,851
  • 32
  • 116
  • 186
  • 1
    This is going to by tough/impossible to do with regex a parser will be much more successful. – rerun Aug 03 '12 at 16:03
  • 3
    Regular expressions won't cut it. You'll need to use/write a parser. –  Aug 03 '12 at 16:03
  • 1
    Have you read http://stackoverflow.com/q/4840988/1030675 ? – choroba Aug 03 '12 at 16:18
  • To answer your question, one needs to write a C++ parser. As such, we're closing your question. – ikegami Aug 03 '12 at 16:25
  • 3
    I suggest you stop wasting your time and use something that's already out there, such as [uncrustify](http://uncrustify.sourceforge.net/). You can call the executable from Perl if you must. – Praetorian Aug 03 '12 at 17:05

2 Answers2

4

Programming languages aren't "regular languages" and strictly speaking you can't parse them with regular expressions. However Perl regexes can be used to define whole top-down recursive grammars. The module Regexp::Grammars makes this easy, powerfull and tidy.

You would also want to look at the (?{CODE}) construct to issue warnings during parsing. A snippet of your grammar could look like this (simplified, just to give you an idea):

...;

<rule: if-statement>
if ( [ \t]+ | (?{warn q{no spaces around "if" condition at $line}}) )
    \( <statement> \)
   ( [ \t]+ | (?{warn q{no spaces around "if" condition at $line}}) )
   \{ <expression>+ \}

<rule: expression>
   <statement> ;

<rule: statement>
   <assignment> | <function-call> | \( <statement> \)

...;

The module Regexp::Grammars will give you a whole syntax tree inside %/ for you to use.

amon
  • 57,091
  • 2
  • 89
  • 149
  • Regular expression aren't actually regular these days. You can actually parse C++ with regular expressions. You wouldn't want to, but you can. – ikegami Aug 03 '12 at 16:23
  • The above comment applies to @thebjorn's answer too. – ikegami Aug 03 '12 at 16:24
  • 1
    Can `Regexp::Grammars` handle context sensitive grammars? Because `a * b;` can be different things in C++ depending on context. – R. Martinho Fernandes Aug 03 '12 at 16:42
  • @R.MartinhoFernandes "context sensitive grammars"? No, at least not as the CS term, because CSGs can be impossible to decide. You can however add "context" by using lookaheads/lookbehinds and, more importantly, writing a *suitable* grammar (`a * b;` *has* to mean "a times b" not "a pointer b") – amon Aug 03 '12 at 17:08
  • 4
    @amon you can't write a *suitable* grammar if you want to parse C++. You have to use the C++ grammar. – R. Martinho Fernandes Aug 03 '12 at 17:09
  • @R.MartinhoFernandes 1. C++ *can* be parsed, see the existence of working compilers. 2. OPs aim does not seem to be to cover *all* possibilities and quirks of this language but to check whether a small subset is met, possibly, but not neccessarily representable by a CFG (!). These bounds can be streched ad infinitum when applying enough wizardry, e.g. creating the grammar on the fly with `(??{})` – amon Aug 03 '12 at 17:20
1

Regular expressions are not expressive enough to parse context free grammars. You can use regular expressions to code your lexer, but you'll have to write a parser too.

thebjorn
  • 26,297
  • 11
  • 96
  • 138