1

I have sample C++ code (http://pastebin.com/6q7zs7tc) from which I have to extract functions names as well as the number of parameters that a function requires. So far I have written this regex, but it's not working perfectly for me.

(?![a-z])[^\:,>,\.]([a-z,A-Z]+[_]*[a-z,A-Z]*)+[(]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Abdul Rehman Janjua
  • 1,461
  • 14
  • 19
  • 1
    First, you should say what you expect it to do and what it actually does that is not perfect. Second, you probably need some sort of parser a bit more powerful than regex. C++ is not a regular language (and people will argue all day about whether it is context free - please don't go there). – BoBTFish Mar 03 '15 at 13:55
  • 1
    No regex can do this job perfectly. Doing this (even close to) truly correctly is a seriously non-trivial task, but if you *really* need to do it, you can use something like CLang (but even just making use of CLang isn't trivial). – Jerry Coffin Mar 03 '15 at 13:57
  • You can't do this with a regex. – n. m. could be an AI Mar 03 '15 at 17:12

1 Answers1

4

You can't parse C++ reliably with regex.

In fact, you can't parse it with weak parsing technology (See Why can't C++ be parsed with a LR(1) parser?). If you expect to get extract this information reliably from source files, you will need a time-tested C++ parser; see https://stackoverflow.com/a/28825789/120163

If you don't care that your extraction process is flaky, then you can use a regex and maybe some additional hackery. Your key problem for heuristic extraction is matching various kinds of brackets, e.g., [...], < ... > (which won't quite work for shift operators) and { ... }. Bracket matching requires you to keep a stack of seen brackets. And bracket matching may fail in the presence of macros and preprocessor conditionals.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341