Here is a pure python solution and very simple to implement.
Function extracting the body
Basically, you try to match each {
with a corresponding }
:
- If there are two
{
before the next }
then you are entering a scope.
- On the other hand, if there is one
}
before the next {
, then you are exiting the scope.
The implementation is then trivial:
- you look for all the indexes of
{
and }
that you maintain in different list
- you also maintain a scope depth variable
- if the current
{
position is below the current }
position, you are entering a scope, you add 1 to the scope depth and you move to the next {
position
- if the current
{
position is above the current }
position, you are exiting a scope, you remove 1 to the scope depth and you move to the next }
position
- if the scope depth variable is 0, then you found the closing brace of the function body
Suppose you have the string starting right after the first brace of your function body (brace excluded), calling the following function with this substring will give you the position of the last brace:
def find_ending_brace(string_from_first_brace):
starts = [m.start() for m in re.finditer('{', string_from_first_brace, re.MULTILINE)]
ends = [m.start() for m in re.finditer('}', string_from_first_brace, re.MULTILINE)]
i = 0
j = 0
current_scope_depth = 1
while(current_scope_depth > 0):
if(ends[j] < starts[i]):
current_scope_depth -= 1
j += 1
elif(ends[j] > starts[i]):
current_scope_depth += 1
i += 1
if(i == len(starts)): # in case we reached the end (fewer { than })
j += 1
break
return ends[j-1]
Extracting candidate function definition
Now, if the original string of your file is in the variable my_content
,
find_func_begins = [m for m in re.finditer("\w+\s+(\w+)\s*\((.*?)\)\s*\{", my_content)]
will give you the prototypes of each function (find_func_begins[0].group(1) == func1
and find_func_begins[0].group(2) == 'int para')
, and
my_content[
find_func_begins[0].start():
find_func_begins[0].end() +
find_ending_brace(my_content[find_func_begins[0].end():])]
will give you the content of the body.
Extracting the prototypes
I suppose you should look again for the function definition after the first ending brace is reached, since the regex for find_func_begins
is a bit loose. Iterating over each function definition and matching braces yields the following iterative algorithm:
reg_ex = "\w+\s+(\w+)\s*\((.*?)\)\s*\{"
last = 0
protos = ""
find_func_begins = [m for m in re.finditer(reg_ex, my_content[last:], re.MULTILINE | re.DOTALL)]
while(len(find_func_begins) > 0):
function_begin = find_func_begins[0]
function_proto_end = last + function_begin.end()
protos += my_content[last: function_proto_end-1].strip() + ";\n\n"
last = function_proto_end + find_ending_brace(my_content[function_proto_end:]) + 1
find_func_begins = [m for m in re.finditer(reg_ex, my_content[last:], re.MULTILINE | re.DOTALL)]
You should have what you want in protos
. Hope this helps!