2

I have a C header file with a lot of enums, typedefs and function prototypes. I want to extract this data using Python regex (re). I really need help with the syntax, because I constantly seem to forget it every time I learn.

ENUMS
-----
enum
{
(tab character)(stuff to be extracted - multiple lines)
};

TYPES
-----
typedef struct (extract1) (extract2)


FUNCTIONS
---------
(return type)
(name)
(
(tab character)(arguments - multiple lines)
);

If anyone could point me in the right direction, I would be grateful.

Vanush Vee
  • 426
  • 1
  • 3
  • 14
  • what do you have so far in terms of your re? – Levon Jun 04 '12 at 03:23
  • `regex = re.compile("enum\n{(.*)}", re.DOTALL)`. I thought I would get all the characters within the enums, in an array, but I get everything. Also, this is for Cython. – Vanush Vee Jun 04 '12 at 03:37
  • For enums check out https://stackoverflow.com/a/66037988/208880 -- some adjustments should also catch the types and functions. – Ethan Furman Sep 24 '21 at 05:29

1 Answers1

4

I imagine something like this is what you're after?

>>> re.findall('enum\s*{\s*([^}]*)};', 'enum {A,B,C};')
['A,B,C']
>>> re.findall("typedef\s+struct\s+(\w+)\s+(\w+);", "typedef struct blah blah;")
[('blah', 'blah')]

There are of course numerous variations on the syntax, and functions are much more complicated, so I'll leave those for you, as frankly these regexps are already fragile and inelegant enough. I would urge you to use an actual parser unless this is just a one-off project where robustness is totally unimportant and you can be sure of the format of your inputs.

Greg E.
  • 2,722
  • 1
  • 16
  • 22