0

There are multiple C++ files. I need to extract the body of for-loop from these files.

Is there an easy way to do this maybe using grep. Consider there are no nested for loops.

erip
  • 16,374
  • 11
  • 66
  • 121
sam1064max
  • 55
  • 6

1 Answers1

5

Without parsing the entire file, the answer is no.

for-loops are comprised of a context-free grammar and, as such, cannot be matched by a regular expression.

A more involved approach is to use grep to search for the beginning of a for-loop (for follow by optional whitespace followed by a lpar) then manually find the closing curly.

Unfortunately parsing C++ is Turing Complete, so unless there's some cute flag to pass to your compiler, you're hosed.

erip
  • 16,374
  • 11
  • 66
  • 121
  • I think you mean np complete or hard and not turing complete. – Otomo Mar 07 '16 at 16:34
  • @Otomo No, I mean [Turing Complete](http://blog.reverberate.org/2013/08/parsing-c-is-literally-undecidable.html), i.e., your parser must be Turing Complete to parse the source. – erip Mar 07 '16 at 17:02
  • oh ... I forgot about templates. But searching for a for loop is in P, isn't it? Substring search is and parenthesis matching is too. Parsing != searching in a text file. – Otomo Mar 07 '16 at 17:02
  • @Otomo Paren matching is inherently CFG, which is to what nested expressions reduces. Consider the basic (read not complete) partial grammar: `for() { * }`. No way to match closing curly because statements can also contain curlies. CFG cannot be matched by regular languages. – erip Mar 07 '16 at 17:04
  • @Otomo _Parsing != searching..._ True, but when you _can't_ search (because regex won't work), your fallback measure is to parse. However, parsing is TC. Not good. – erip Mar 07 '16 at 17:12
  • Yeah. That doesn't conflict with what I wrote. I agree that it's not possible to solve it with regex. But I think this has nothing to do with parsing the code. – Otomo Mar 07 '16 at 17:12
  • 1
    @Otomo This has everything to do with parsing the code. Because there are defined grammar rules, parsing is the go-to. Parsing chunks the code. However, since the entire language isn't CFG, you can't use a parser (other than a TC one) to get those chunks. One of the chunks is the body of the for-loop. In the above grammar it would be `*`. – erip Mar 07 '16 at 17:20