3

I am trying to get a Python regex to search through a .c file and get the function(s) inside it.

For example:

int blahblah(
  struct _reent *ptr __attribute__((unused)),
  const char    *old,
  const char    *new
)
{
...

I would want to get blahblah as the function.

This regex doesn't work for me, it keeps on giving me None: r"([a-zA-Z0-9]*)\s*\([^()]*\)\s*{"

High schooler
  • 1,682
  • 3
  • 30
  • 46

3 Answers3

4

(?<=(int\s)|(void\s)|(string\s)|(double\s)|(float\s)|(char\s)).*?(?=\s?\()

http://regexr.com?3332t

This should work for what you want. Just keep adding types that you need to catch.

re.findall(r'(?<=(?<=int\s)|(?<=void\s)|(?<=string\s)|(?<=double\s)|(?<=float\s‌​)|(?<=char\s)).*?(?=\s?\()', string) will work for python.

Jack
  • 5,680
  • 10
  • 49
  • 74
  • Im running `re.findall( r'(?<=(int\s)|(void\s)|(string\s)|(double\s)|(float\s)|(char\s)).*?(?=\s?\()', string)` but I seem to get the error: raise error, v # invalid expression sre_constants.error: look-behind requires fixed-width pattern – High schooler Dec 11 '12 at 00:45
  • 1
    @AA It seems that python regex is slightly different from the norm. Try this `re.findall(r'(?<=(?<=int\s)|(?<=void\s)|(?<=string\s)|(?<=double\s)|(?<=float\s)|(?<=char\s)).*?(?=\s?\()', string)` – Jack Dec 11 '12 at 05:37
3

The regular expression isn't catching it because of the parentheses in the arguments (specifically, the parentheses in __attribute__((unused))). You might be able to adapt the regular expression for this case, but in general, regular expressions cannot parse languages like C. You may want to use a full-fledged parser like pycparser.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • I would like to use the built-in python library. Thanks for the response. – High schooler Dec 08 '12 at 21:11
  • 2
    @AA: If you have to find functions in any valid C code, then I'm not sure you have an option. A regular expression *cannot* parse a language like C — C is a context-free language, whereas regular expressions can only parse regular languages. (see [Chomsky hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy) on Wikipedia) – icktoofay Dec 08 '12 at 21:19
  • Cant we just match against.. [int, void, ect.][any amount of space][function name][bracket][anything in here][unbracket][curly bracket][uncurly bracket] – High schooler Dec 08 '12 at 21:27
  • @AA: True regular expressions cannot match the "anything in here" part of that. Without extensions that make it not actually regular expressions, you cannot match nested parentheses. It will stop on the first closing parenthesis. – icktoofay Dec 08 '12 at 21:29
0

Regexps are not a proper tool for extracting some semantic information from source code files (though they're good for syntax highlighting - because syntax is often expressed through regular expressions). Regexps can't handle nested constructions, track what is going on, distingiush types and symbols.

I'd recommend some specialized tool that is really aware of the language structure, like ctags or python-pygccxml.

ctags is a program that generates a list of entities in a C source with with their places (used to assist navigation through C code bases in text editors like vi and emacs). python-pygccxml is a Python binding to C library libgccxml that uses gcc internals to analyze the code and produces rich and structured output about program semantics.

Dmytro Sirenko
  • 5,003
  • 21
  • 26