0

I working on a perl script that prints the required function body from the c source file. i have written a regex to get to the start of the function body as

(/(void|int)\s*($function_name)\s*\(.*?\)\s*{/s

but this works only for functions returning void or int(basic types) how can i change this regex to handle user defined datatypes (struct or pointers)

user4377237
  • 11
  • 1
  • 1
  • 3
  • 1
    You can't parse C with regular expressions (even - reliably - to detect the start of a function body). This post explains why you can't parse HTML with regular expressions and many of the same principles apply: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – abligh Dec 21 '14 at 17:01
  • Check CPAN's `C::Scan` and similar to it modules. That is not a new task to pars the C from Perl. – Dummy00001 Dec 22 '14 at 10:23

1 Answers1

1

Try this one (untested!), although it does expect the function to start at the beginning of a line :

/
^                            # Start of line
\s*(?:struct\s+)[a-z0-9_]+   # return type
\s*\**                       # return type can be a pointer
\s*([a-z0-9_]+)              # Function name
\s*\(                        # Opening parenthesis
(
    (?:struct\s+)            # Maybe we accept a struct?
    \s*[a-z0-9_]+\**         # Argument type
    \s*(?:[a-z0-9_]+)        # Argument name
    \s*,?                    # Comma to separate the arguments
)*
\s*\)                        # Closing parenthesis
\s*{?                        # Maybe a {
\s*$                         # End of the line
/mi                          # Close our regex and mark as case insensitive

You can squeeze all of these into a single line by removing the whitespace and comments.

Parsing code with a regex is generally hard though, and this regex is not perfect at all.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Tom van der Woerdt
  • 29,532
  • 7
  • 72
  • 105
  • apart from structs, pointers and basic data types, what else could be possible return types for functions in C – user4377237 Dec 21 '14 at 15:06
  • @user4377237 This does not seem to handle functions that return function pointers, and obviously doesn't handle cases where macros are used to generate part of the syntax. E.g. `NS(record) NS(function)(NS(record) r)` where `#define NS(symbol) my_namespace_ ## symbol`. You'd have to run the source code through the preprocessor first (e.g. `gcc -E`), which could then be used to figure out the line number in the unpreprocessed code. – amon Dec 21 '14 at 17:46
  • Tip: You can create reusable regex snippets by using a `(?(DEFINE) (? pattern) )` section, and then in the same regex invoke such a named pattern like `(?&name)`. This allows Perl regexes to parse any LL(k) grammar, and means that you can make your regexes more self-documenting. Also, you can use the `/x` flag so that whitespace becomes insignificant and comments may be used inside the regex – you wouldn't have to squash the regex into a single line for it to be used. – amon Dec 21 '14 at 17:51